Link to the application: https://portal.nersc.gov/project/m3363/
References:
-
Get the SBI-FAIR repository
git clone --depth 1 https://github.com/DSC-SPIDAL/sbi-fair SBI_FAIR_DIR=${PWD}/sbi-fair
-
Create a directory for downloading datasets and store results
mkdir cosmoflow cd cosmoflow mkdir output
-
Get the datasets for training
${SBI_FAIR_DIR}/tools/scripts/load_dataset.py ${SBI_FAIR_DIR}/datasets/cosmoflow/datasets.yaml cosmoUniverse_2019_05_4parE_tf_v2_mini
You can use any of the following datasets:
- cosmoUniverse_2019_05_4parE_tf_v2_mini
- cosmoUniverse_2019_05_4parE_tf_v2
-
Create a file with parameters
# Few epochs for testing echo 'epochs: 2' > options.yaml
We will update the list of available options here, in the meantime please refer to the original repository https://github.com/sparticlesteve/cosmoflow-benchmark for the list of all options.
-
Build Docker container
cd ${SBI_FAIR_DIR}/models/cosmoflow ./build_docker.sh cd - # Go back to results directory
-
Run Training
GPU_SWITCH='--runtime=nvidia --gpus all' # or '' for CPU workloads # Mount the directories with the dataset VOLUME_MOUNTS='-v ./cosmoUniverse_2019_05_4parE_tf_v2_mini/default:/input/train_dataset -v ./output:/output -v ./options.yaml:/input/options.yaml' docker run ${GPU_SWITCH} ${VOLUME_MOUNTS} cosmoflow run train
-
Build Apptainer container
cd ${SBI_FAIR_DIR}/models/cosmoflow ./build_apptainer.sh cd - # Go back to results directory
-
Run Training
GPU_SWITCH='--nv' # or '' for CPU workloads # Mount the directories with the dataset VOLUME_MOUNTS='--bind ./cosmoUniverse_2019_05_4parE_tf_v2_mini/default:/input/train_dataset --bind ./output:/output --bind ./options.yaml:/input/options.yaml' apptainer run --app train ${GPU_SWITCH} ${VOLUME_MOUNTS} ${SBI_FAIR_DIR}/models/cosmoflow/cosmoflow.sif
The outputs of the run will be available in ./output
.