Here are some example configurations for setting up Horovod in different frameworks:

TensorFlow Configuration

# tensorflow_config.yaml
horovod:
  framework: tensorflow
  backend: mpi
  master: tcp://localhost:2222
  workers:
    - rank: 0
      hostname: worker0
    - rank: 1
      hostname: worker1
Tensorflow Config

PyTorch Configuration

# pytorch_config.yaml
horovod:
  framework: pytorch
  backend: nccl
  master: tcp://127.0.0.1:2222
  workers:
    - rank: 0
      hostname: worker0
    - rank: 1
      hostname: worker1
PyTorch Config

Key Configuration Parameters

  • framework: Choose between tensorflow, pytorch, or mxnet
  • backend: Use mpi, nccl, or gloo based on your cluster setup
  • master: Specify the master node address (e.g., tcp://host:port)
  • workers: List all worker nodes with their ranks and hostnames

For more details on Horovod configuration options, please visit /Horovod/docs/