add extractor name to prediction request for new workflow #186

Tooyosi · 2024-11-18T22:16:01Z

Final part of setup for new workflow usage. Pass in the workflow name for bajor predictions request

yuenmichelle1

Some small fixes but I have a couple of questions:

Q: OOC, Will there ever be a case when there is more than 1 context with the same active_subject_set_id? I think looking at the query for context on #run, there is an underlying assumption that there will not be a context with the same active_subject_set_id/pool_subject_set_id. Totally fine for now since the goal for right now is to get kade and bajor flow to work for 2 workflows, but something to think about as more workflows are added to use the kade bajor flow. cc @lcjohnso for input as well
I would probably add a couple of more tests, especially for the long query on Context
1. A test if there is no context coming from the query (i.e. Context.where yadda yadda is empty). Test what expected behavior. I think what ends up happening is workflow_name is nil and prediction job is submitted using cosmic_dawn
2. A test where context is found by pool_subject_set_id
3. A test where context is found by active_subject_set_id
  - One case where there is only 1 relevant context
  - Another case where there are 2 relevant contexts (1 context with the right active_subject_set_id and another with the right pool_subject_set_id) => it should pick the context with the right active_subject_set_id.
  ^^ These might be overkill given the goal is to get this working for 2 cases for now. I'll leave this up to your discretion @Tooyosi (And you can double check with @lcjohnso as well).

If you do decide to add these tests, you can do a check to see if the workflow_name that is being sent to bajor client is the expected workflow_name.

app/services/batch/prediction/create_job.rb

spec/services/batch/prediction/create_job_spec.rb

Tooyosi · 2024-12-04T12:11:07Z

Some small fixes but I have a couple of questions:

Q: OOC, Will there ever be a case when there is more than 1 context with the same active_subject_set_id? I think looking at the query for context on #run, there is an underlying assumption that there will not be a context with the same active_subject_set_id/pool_subject_set_id. Totally fine for now since the goal for right now is to get kade and bajor flow to work for 2 workflows, but something to think about as more workflows are added to use the kade bajor flow. cc @lcjohnso for input as well

I would probably add a couple of more tests, especially for the long query on Context

A test if there is no context coming from the query (i.e. Context.where yadda yadda is empty). Test what expected behavior. I think what ends up happening is workflow_name is nil and prediction job is submitted using cosmic_dawn

A test where context is found by pool_subject_set_id

A test where context is found by active_subject_set_id

One case where there is only 1 relevant context

Another case where there are 2 relevant contexts (1 context with the right active_subject_set_id and another with the right pool_subject_set_id) => it should pick the context with the right active_subject_set_id.
^^ These might be overkill given the goal is to get this working for 2 cases for now. I'll leave this up to your discretion @Tooyosi (And you can double check with @lcjohnso as well).

If you do decide to add these tests, you can do a check to see if the workflow_name that is being sent to bajor client is the expected workflow_name.

@yuenmichelle1 Thanks for the comments,

You're correct with my assumptions, but we can discuss on the backend call aswell to get input from @lcjohnso
For the tests, i added newer test cases as suggested. Most of the tests in client_spec.rb validates the default usecase of cosmic dawn but added specific cases here

lcjohnso

One change requested (see inline comment below), but I'll go ahead and approve so merging can move forward without first input from me.

Regarding whether the pool_subject_set_id to context pairing is unique -- it should be. We'll keep this in mind, but I don't expect pool_subject_set_id to repeat, which would be equivalent to using the same training subject set for two models -- this seems unlikely. In fact, given this fact, my recommended change is to only check the pool subject set ID, as that's the ID already used by the training loop.

app/services/batch/prediction/create_job.rb

add extractor name to prediction request for new workflow

25fe100

Tooyosi requested review from lcjohnso and yuenmichelle1 November 18, 2024 22:16

fix test

764236d

yuenmichelle1 reviewed Nov 26, 2024

View reviewed changes

app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved

app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved

spec/services/batch/prediction/create_job_spec.rb Outdated Show resolved Hide resolved

add new test cases

74fd273

lcjohnso approved these changes Dec 11, 2024

View reviewed changes

app/services/batch/prediction/create_job.rb Outdated Show resolved Hide resolved

rely on only pool_subject_set_id for fetching prediction context

da3dd75

Tooyosi merged commit 33de8bf into main Dec 12, 2024
1 check passed

Tooyosi deleted the euclid-workflow-predictions branch December 12, 2024 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add extractor name to prediction request for new workflow #186

add extractor name to prediction request for new workflow #186

Tooyosi commented Nov 18, 2024

yuenmichelle1 left a comment

Tooyosi commented Dec 4, 2024

lcjohnso left a comment

add extractor name to prediction request for new workflow #186

add extractor name to prediction request for new workflow #186

Conversation

Tooyosi commented Nov 18, 2024

yuenmichelle1 left a comment

Choose a reason for hiding this comment

Tooyosi commented Dec 4, 2024

lcjohnso left a comment

Choose a reason for hiding this comment