-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add /verifyruns cog #36
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some time we could save in having a single command verify all of modal, nvidia and amd one after the other and letting us know how many of those 3 succeeded. Also as far as the UX goes, why not have verify actually trigger the bot 3 times instead of having a human manually verify in a thread
This cog provides functionality to verify that either a GitHub Actions or | ||
Modal run completed successfully by checking for specific message patterns | ||
in a Discord thread. It supports verification of two types of runs: | ||
1. GitHub Actions runs - Identified by "GitHub Action triggered!" message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to test a trigger but we've had issues where this wouldn't have helped like for example when we had timeout issues with both the NVIDIA and AMD runner
OK, thanks, Mark. I'll switch I switched this PR to draft for now because I'm in the middle of re-doing this. |
Latest changes launch 3 run jobs. Screenshot below. I find the bot's messages like "Created thread GitHub Job (AMD) - 2024-11-29 1… for your GitHub job" a little distracting, given that the threads also show up as separate lines in the UI. Should we remove those "created thread" messages? @msaroufim, I don't know if this addresses your comment from above ("we've had issues where this wouldn't have helped"). If a GitHub workflow times out, then I think the GitHub cog will just return, and the thread won't have the happy path messages in it, so verification will fail. So I think it addresses your comment, but I'm not fully sure. |
Yeah should remove those in another PR
Yeah i recently implemented timeouts so they should solve that specific problem |
Mark, do you think this needs a final message at the end that says "✅ All tests passed"? |
Yeah that sounds nice |
Description
Adds a
/verifyrun
command that does the same work as the smoke test but is easier to use.Checklist
Before submitting this PR, ensure the following steps have been completed:
/run
.You may need to exercise some judgement about the script and GPU type.
your
/run
message ("Cluster Bot started a thread: ..."):...
) to the cluster bot's message.README.md.