Horovod is a distributed deep learning training framework designed for multi-GPU and multi-node setups. Proper configuration ensures optimal performance and scalability. Below are key settings and recommendations:
Installation & Dependencies
- Install Horovod:
📌 For detailed installation steps, see Horovod Installation Docs.pip install horovod
- Dependencies:
- TensorFlow/PyTorch version compatibility
- MPI (Message Passing Interface) support via
openmpi
- Networking configuration for cluster communication
Core Concepts
- Distributed Training:
- Use
horovodrun
to launch training across multiple workers - Example:
horovodrun -np 4 python train.py
- Use
- Data Parallelism:
- Splits data across GPUs and aggregates gradients
- 📎 Learn more about data parallel
- Model Parallelism:
- Splits model across GPUs (for large models)
- Requires custom implementation
Advanced Configuration
- Customize Horovod Settings:
- Modify
horovod/tensorflow/vars.py
orhorovod/pytorch/vars.py
for framework-specific tweaks - Adjust
HOROVOD_MPI_THREADS_AWARE
environment variable for thread-aware MPI
- Modify
- Cluster Architecture:
- Single-node multi-GPU vs multi-node distributed setups
- Use
--hostfile
to specify worker addresses - 📊 Cluster setup diagram
Best Practices
- Optimize Communication:
- Use
rdma
orgRPC
for faster inter-node communication - Monitor bandwidth with
horovodrun --monitor
- Use
- Configuration Example:
📌 For more examples, check Horovod Configuration Samples.horovod: backend: "mpi" tensorflow: num_gpus: 4 pytorch: use_all_available_gpus: true