You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the tutorial call neuron_parallel_compile inside of the bash script. Because neuron_parallel_compile is responsible for setting $NEURON_EXTRACT_GRAPHS_ONLY, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi
Currently, the tutorial call
neuron_parallel_compile
inside of the bash script. Becauseneuron_parallel_compile
is responsible for setting$NEURON_EXTRACT_GRAPHS_ONLY
, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.mdx
Lines 215 to 262 in 3748a06
We need to refactor the tutorial to call
neuron_parallel_compile
on the training script.Example can be found here:
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html
The text was updated successfully, but these errors were encountered: