Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extractor name to prediction request for new workflow #186

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

Tooyosi
Copy link
Contributor

@Tooyosi Tooyosi commented Nov 18, 2024

Final part of setup for new workflow usage. Pass in the workflow name for bajor predictions request

Copy link

@yuenmichelle1 yuenmichelle1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small fixes but I have a couple of questions:

  • Q: OOC, Will there ever be a case when there is more than 1 context with the same active_subject_set_id? I think looking at the query for context on #run, there is an underlying assumption that there will not be a context with the same active_subject_set_id/pool_subject_set_id. Totally fine for now since the goal for right now is to get kade and bajor flow to work for 2 workflows, but something to think about as more workflows are added to use the kade bajor flow. cc @lcjohnso for input as well

  • I would probably add a couple of more tests, especially for the long query on Context

    1. A test if there is no context coming from the query (i.e. Context.where yadda yadda is empty). Test what expected behavior. I think what ends up happening is workflow_name is nil and prediction job is submitted using cosmic_dawn
    2. A test where context is found by pool_subject_set_id
    3. A test where context is found by active_subject_set_id
      - One case where there is only 1 relevant context
      - Another case where there are 2 relevant contexts (1 context with the right active_subject_set_id and another with the right pool_subject_set_id) => it should pick the context with the right active_subject_set_id.
      ^^ These might be overkill given the goal is to get this working for 2 cases for now. I'll leave this up to your discretion @Tooyosi (And you can double check with @lcjohnso as well).

If you do decide to add these tests, you can do a check to see if the workflow_name that is being sent to bajor client is the expected workflow_name.

app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved
app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved
spec/services/batch/prediction/create_job_spec.rb Outdated Show resolved Hide resolved
@Tooyosi
Copy link
Contributor Author

Tooyosi commented Dec 4, 2024

Some small fixes but I have a couple of questions:

  • Q: OOC, Will there ever be a case when there is more than 1 context with the same active_subject_set_id? I think looking at the query for context on #run, there is an underlying assumption that there will not be a context with the same active_subject_set_id/pool_subject_set_id. Totally fine for now since the goal for right now is to get kade and bajor flow to work for 2 workflows, but something to think about as more workflows are added to use the kade bajor flow. cc @lcjohnso for input as well

  • I would probably add a couple of more tests, especially for the long query on Context

    1. A test if there is no context coming from the query (i.e. Context.where yadda yadda is empty). Test what expected behavior. I think what ends up happening is workflow_name is nil and prediction job is submitted using cosmic_dawn
    2. A test where context is found by pool_subject_set_id
    3. A test where context is found by active_subject_set_id
      • One case where there is only 1 relevant context
      • Another case where there are 2 relevant contexts (1 context with the right active_subject_set_id and another with the right pool_subject_set_id) => it should pick the context with the right active_subject_set_id.
        ^^ These might be overkill given the goal is to get this working for 2 cases for now. I'll leave this up to your discretion @Tooyosi (And you can double check with @lcjohnso as well).

If you do decide to add these tests, you can do a check to see if the workflow_name that is being sent to bajor client is the expected workflow_name.

@yuenmichelle1 Thanks for the comments,

  • You're correct with my assumptions, but we can discuss on the backend call aswell to get input from @lcjohnso
  • For the tests, i added newer test cases as suggested. Most of the tests in client_spec.rb validates the default usecase of cosmic dawn but added specific cases here

Copy link
Member

@lcjohnso lcjohnso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change requested (see inline comment below), but I'll go ahead and approve so merging can move forward without first input from me.

Regarding whether the pool_subject_set_id to context pairing is unique -- it should be. We'll keep this in mind, but I don't expect pool_subject_set_id to repeat, which would be equivalent to using the same training subject set for two models -- this seems unlikely. In fact, given this fact, my recommended change is to only check the pool subject set ID, as that's the ID already used by the training loop.

app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved
@Tooyosi Tooyosi merged commit 33de8bf into main Dec 12, 2024
1 check passed
@Tooyosi Tooyosi deleted the euclid-workflow-predictions branch December 12, 2024 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants