Horovod is a distributed deep learning training framework designed for multi-GPU and multi-node setups. Proper configuration ensures optimal performance and scalability. Below are key settings and recommendations:

Installation & Dependencies

  • Install Horovod:
    pip install horovod  
    
    📌 For detailed installation steps, see Horovod Installation Docs.
  • Dependencies:
    • TensorFlow/PyTorch version compatibility
    • MPI (Message Passing Interface) support via openmpi
    • Networking configuration for cluster communication

Core Concepts

  • Distributed Training:
    • Use horovodrun to launch training across multiple workers
    • Example:
      horovodrun -np 4 python train.py  
      
  • Data Parallelism:
  • Model Parallelism:
    • Splits model across GPUs (for large models)
    • Requires custom implementation

Advanced Configuration

  • Customize Horovod Settings:
    • Modify horovod/tensorflow/vars.py or horovod/pytorch/vars.py for framework-specific tweaks
    • Adjust HOROVOD_MPI_THREADS_AWARE environment variable for thread-aware MPI
  • Cluster Architecture:
    • Single-node multi-GPU vs multi-node distributed setups
    • Use --hostfile to specify worker addresses
    • 📊 Cluster setup diagram

Best Practices

  • Optimize Communication:
    • Use rdma or gRPC for faster inter-node communication
    • Monitor bandwidth with horovodrun --monitor
  • Configuration Example:
    horovod:  
      backend: "mpi"  
      tensorflow:  
        num_gpus: 4  
      pytorch:  
        use_all_available_gpus: true  
    
    📌 For more examples, check Horovod Configuration Samples.
Horovod Logo
Distributed Training Setup
Configuration Example