Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI environment variables are not set #6895

Open
fabiogeraci opened this issue Dec 18, 2024 · 2 comments
Open

MPI environment variables are not set #6895

fabiogeraci opened this issue Dec 18, 2024 · 2 comments

Comments

@fabiogeraci
Copy link

System Info
HPC ubuntu 22.04 2nodesx8H100

LSF as scheduler

[tool.poetry.dependencies]
python = "^3.10"

importlib-metadata = { version = "~=1.0", python = "<3.8" }
tensorboard = "^2.16.2"
sge-data-package = {version = "", source = "sgedata"}
torch = "2.2.1"
torchvision = "0.17.1"
torchaudio = "2.2.1"
transformers = "4.42.0"
datasets = "2.18."
accelerate = "0.28.0"
deepspeed = "0.13.4"
safetensors = "0.4.2"
mpi4py = "^4.0.0"

module load cuda-12.1.1
module load ISG/experimental/fg12/openmpi/5.0.4-cuda12.1-lsf
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

deepspeed \
    --hostfile=${HOSTFILE_PATH} \
    --launcher=OPENMPI \
    --launcher_args="-bind-to none -map-by slot --mca pml ob1 --oversubscribe --display-allocation --display-map" \
    --master_addr=${MASTER_ADDR} \
    --master_port=${_M_PORT} \
    --no_ssh_check \
    src/dna_mlm/runner.py
def setup_env_ranks() -> tp.Tuple[int, int, int]:

    # Map MPI environment variables to those expected by DeepSpeed/PyTorch
    if 'OMPI_COMM_WORLD_LOCAL_RANK' in os.environ:
        os.environ['LOCAL_RANK'] = os.environ['OMPI_COMM_WORLD_LOCAL_RANK']
        os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
        os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
    else:
        raise EnvironmentError(
            "MPI environment variables are not set. "
            "Ensure you are running the script with an MPI-compatible launcher."
        )
 
 setup_env_ranks()

the function should set the env vars but instaed it raises the error

@fabiogeraci
Copy link
Author

I found the error

deepspeed \
    --hostfile ${HOSTFILE_PATH} \
    --launcher "OPENMPI" \ #openmpi should have been between ""
    --launcher_args "-bind-to none -map-by slot --allow-run-as-root --mca pml ob1 --oversubscribe --display-allocation --display-map" \
    --master_addr ${MASTER_ADDR} \
    --master_port ${_M_PORT} \
    --no_ssh_check \
    src/runner.py

@fabiogeraci
Copy link
Author

the real question is why I need to setup

    if 'OMPI_COMM_WORLD_LOCAL_RANK' in os.environ:
        os.environ['LOCAL_RANK'] = os.environ['OMPI_COMM_WORLD_LOCAL_RANK']
        os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
        os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
    else:
        raise EnvironmentError(
            "MPI environment variables are not set. "
            "Ensure you are running the script with an MPI-compatible launcher."
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant