-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel runner in use #448
Conversation
Remove non-working suts from calibration. Give a sensible __str__ to things with uids.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 with just a couple of questions.
@@ -81,7 +70,7 @@ def cli() -> None: | |||
help="Path to directory containing custom branding.", | |||
) | |||
@click.option("--anonymize", type=int, help="Random number seed for consistent anonymization of SUTs") | |||
@click.option("--parallel", default=False, help="Experimentally run SUTs in parallel") | |||
@click.option("--parallel", default=False, help="Obsolete flag, soon to be removed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping Click provided a deprecated
argument for options, but unfortunately it doesn't (natively). It does for commands. An enterprising soul added that for options here:
Do we care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked for that argument as well. That particular code requires you to have an option you'd rather they use, which isn't the case here. There is an open issue for this: pallets/click#2263
I've explained our use case, so perhaps one day they'll support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to emit a deprecation warning if this option is used?
https://docs.python.org/3/library/warnings.html#warnings.warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out. If I do it without a specific warning, it prints this:
Version 0.5 of this benchmark is a proof of concept only. Results are not intended to indicate actual levels of AI system safety.
/home/william/projects/mlcommons/modelbench/src/modelbench/run.py:98: UserWarning: --parallel option is unnecessary; benchmarks are now always run in parallel
warnings.warn("--parallel option is unnecessary; benchmarks are now always run in parallel")
And if I add DeprecationWarning to it it doesn't print anything at all. (That appears to be on purpose; I think you have to run python
with -X dev to get it to show deprecation warnings.)
Overall, I prefer the bare message.
runner.secrets = load_secrets_from_config() | ||
runner.benchmarks = benchmarks | ||
runner.suts = suts | ||
runner.max_items = max_instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can max_instances
default to something sensible based on the machine's capabilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speed doesn't vary much based on the user's hardware; it's about the capabilities of service providers like TogetherAI and what the user already has cached. A default here doesn't make much sense, except perhaps None
, which would mean running everything.
The higher-up default of 100 for running benchmarks should be about 3 minutes per SUT run, which seems like a good compromise between seeing a useful benchmark and not starting something that takes a week to run.
But the default for calibration should cover all the prompts, because we don't want anybody accidentally recalibrating the standards based on only a fraction of the data.
|
||
|
||
class TestRunTrackers: | ||
def test_null(self, capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The capsys thing was new to me too. Nice to just have it handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Applying the parallel runner to the CLI. Removing some unused CLI commands. Assorted minor tidying.