-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch evaluation script #92
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple paths in the sbatch
script are not properly quoted and can cause issues if they include spaces, for example. For more safety it would be preferred to correctly quote all variable expansions in Bash scripts.
Also, the whole argument handling portion of run_server_no_opt.py
is way too manual. There should be more abstraction here. For example, many command line flags are already missing. Continued experimentation would imply that every single change that adds arguments would also need to manually add them here, lest they would be ignored.
I noted lots of things that would need some love, but in general this is super useful and I'm very happy you created the script!
# Not enabling --use-flash-attn during inference as advised | ||
# if args.use_flash_attn: | ||
# sys.argv.append("--use-flash-attn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the comment above since it is confusing and uncomment these lines so that FlashAttention is activated if args.bf16 or args.fp16
. FlashAttention is an exact algorithm, so we do not gain anything from deactivating it.
I agree, I had a look into this and it seems that it is possible to call |
That's a great find! |
Sbatch script that performs evaluation on a given set of tasks for a given collection of model checkpoints using the Megatron-LM-client-server inference solution.