implemet batch aggregation on the client #318

Tooyosi · 2024-08-14T09:31:32Z

Implementation of batch aggregation on the Workflows class.

Batch Aggregation guide: https://github.com/orgs/zooniverse/projects/44/views/2?sliceBy%5BcolumnId%5D=74376894&pane=issue&itemId=70649021

You can test with Workflow(3819).run_batch_aggregation(1326469)

yuenmichelle1 · 2024-09-05T14:32:49Z

Hey
@Tooyosi and @lcjohnso . I am just now looking at this. Off the bat I have a big question that maybe either of you should be able to answer.

Is there a reason why we are not having Aggregation inherit from PanoptesObject? (Knowing that Aggregations is a table and endpoint on Panoptes)

I think we can save a lot of code if we do something like

class Aggregation(PanoptesObject):
  _api_slug = 'aggregations'
  _link_slug = 'aggregations'
 _edit_attributes = (
     EDITABLE ATTRIBUTES HERE
)
end

So that instead of BatchAggregation.get_aggregations(workflow_id) we can utilize Aggregation.where(workflow_id=workflow_id) and instead of BatchAggregation.get_aggregation(id) we can utilize Aggregation.find(id) (where and find coming from PanoptesObject)
same thing for delete and create etc

The only custom aggregation method would possibly be run_aggregation.

yuenmichelle1

Hey Toyosi! I had a couple of questions on this PR that maybe you or Cliff can answer. Also don't forget to clear out any hound sniifs :)

yuenmichelle1 · 2024-09-05T14:48:19Z

panoptes_client/tests/test_workflow.py

@@ -24,6 +25,27 @@ def setUp(self):
        self.addCleanup(caesar_get_patch.stop)
        self.addCleanup(caesar_put_patch.stop)

+        batch_agg_run_aggregation_patch = patch.object(BatchAggregation, 'run_aggregation')


I feel like the batch agg patches are so far away from the actual related tests, its easy to forget about them existing.

I wonder if its worth having a separate class within the same file for these changes and rename the ones geared towards caesar to separate.

i.e.

class TestWorkflowCaesarFeatures(unittest.TestCase): def setUp(self): caesar test setup stuff class TestWorkflowRunAggregation(unittest.TestCase): def setUp(self): BATCH AGG SETUP STUFF def test_run_aggregation_stuff(self):

Another way to mitigate (other than splitting tests into classes ) is to patch on the test. eg

@patch('run_aggregation', return_value=EXPECTED RETURN VALUE) def test_run_aggregation(self, mock_run_agregation): WHATEVER TEST

zwolf · 2024-09-05T17:16:32Z

Is there a reason why we are not having Aggregation inherit from PanoptesObject? (Knowing that Aggregations is a table and endpoint on Panoptes)

Not really: the goal was to be clear about how Aggregations (the Panoptes resource) are intended to be used by developers. That was my first idea too, and after the conversation we just had it seems like it would be more straightforward to implement Aggregations as a model in the client that inherits from PanoptesObject. That would provide the get/post/delete methods required by the run_aggregation method and eliminate the re-implementation in batch_aggregation.py.

In a refactor, I would start by creating a new class Aggregation(PanoptesObject). That'd give you a lot of what's in batch_aggregation.py by default. This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly. run_aggregation can then be moved directly onto the Workflow model and refactored to use the Aggregation model's API methods. Probably makes sense to move the status check & link generation logic to the aggregation model also. At that point, batch_aggregation.py can be removed and the specs refactored to test the new model, workflow model specs probably won't change much.

yuenmichelle1 · 2024-09-05T18:54:34Z

panoptes_client/__init__.py

@@ -13,3 +13,4 @@
 from panoptes_client.subject_workflow_status import SubjectWorkflowStatus
 from panoptes_client.caesar import Caesar
 from panoptes_client.inaturalist import Inaturalist
+from panoptes_client.batch_aggregation import BatchAggregation


Noting that if we want users to only interact with aggregations via the Workflow, we probably do not need this line.

yuenmichelle1 · 2024-09-06T01:15:21Z

panoptes_client/workflow.py

+        """
+        This method will fetch existing aggregation status if any.
+        """
+        return self.get_batch_aggregations()['aggregations'][0]['status']


I think there should be a check if there are no aggregations. i.e.
when self.get_batch_aggregations()['aggregations'] length of resulting array is 0.

Same with get_batch_aggregation_links (method below on L578)

yuenmichelle1 · 2024-09-09T00:40:33Z

Re: zach's comment:

This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly.

I don't entirely agree with this ^, since we want to be able to create an aggregation in some cases through workflow's run_batch_aggregation.

I think what we could do is...

Have a simple Aggregation class that can look like the following in a aggregation.py file (You can call it BatchAggregation if you want. I think my preference is to match the panoptes table name.)

from panoptes_client.panoptes import PanoptesObject

class Aggregation(PanoptesObject):
    _api_slug = 'aggregations'
    _link_slug = 'aggregations'
    _edit_attributes = (
        {
            'links': (
                'workflow',
                'user',
            )
        },
    )
  ## NOTE: that the ,s (commas) are important on the edit attributes links ^

Have this class be only imported in workflow.py but do not make it avail through __init__.py, which will cover the fact that we do not want to have a user edit an aggregation "directly".

In workflow.py have a private method called _create_aggregation that probably can look like the following:

def _create_agg(self, user_id):
        new_aggregation = Aggregation()
        new_aggregation.links.workflow = self.id
        new_aggregation.links.user = user_id
        new_aggregation.save()
        return new_aggregation

workflow.py's get_batch_aggregations can look as simple as:
return Aggregation.where(workflow_id=self.id)
^ .where for a PanoptesObject returns a ResultPaginator instance so you'll want to access the first Aggregation that comes from this query through:
.next() and check for results length via .object_count
Then in workflow.py's run_batch_aggregation can look like:

 def run_batch_aggregation(self, user=None, delete_if_exists=False):
       TYPE CHECK FOR USER /SETTING _user_id GOES HERE
        try:
            workflow_aggs = self.get_batch_aggregations()
            if len(workflow_aggs.object_count) > 0:
                agg_id = workflow_aggs.next().id
                current_aggregation = Aggregation.find(agg_id)
                if delete_if_exists:
                    current_aggregation.delete()
                    return self._create_agg(_user_id)
                else:
                    return current_aggregation
            else:
                return self._create_agg(_user_id)
        except PanoptesAPIException as e:
            raise e

I haven't fully tested this out, so I think that would work? And I might have some typos here and there cuz I'm just free coding 😆 but I think the code sniffs a bit better this way.

Tooyosi · 2024-09-10T06:40:10Z

Re: zach's comment:

This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly.

I don't entirely agree with this ^, since we want to be able to create an aggregation in some cases through workflow's run_batch_aggregation.

I think what we could do is...

Have a simple Aggregation class that can look like the following in a aggregation.py file (You can call it BatchAggregation if you want. I think my preference is to match the panoptes table name.)
from panoptes_client.panoptes import PanoptesObject

class Aggregation(PanoptesObject):
    _api_slug = 'aggregations'
    _link_slug = 'aggregations'
    _edit_attributes = (
        {
            'links': (
                'workflow',
                'user',
            )
        },
    )
  ## NOTE: that the ,s (commas) are important on the edit attributes links ^ 
Have this class be only imported in workflow.py but do not make it avail through __init__.py, which will cover the fact that we do not want to have a user edit an aggregation "directly".

In workflow.py have a private method called _create_aggregation that probably can look like the following:
def _create_agg(self, user_id):
        new_aggregation = Aggregation()
        new_aggregation.links.workflow = self.id
        new_aggregation.links.user = user_id
        new_aggregation.save()
        return new_aggregation
workflow.py's get_batch_aggregations can look as simple as:
return Aggregation.where(workflow_id=self.id)
^ .where for a PanoptesObject returns a ResultPaginator instance so you'll want to access the first Aggregation that comes from this query through:
.next() and check for results length via .object_count

Then in workflow.py's run_batch_aggregation can look like:
 def run_batch_aggregation(self, user=None, delete_if_exists=False):
       TYPE CHECK FOR USER /SETTING _user_id GOES HERE
        try:
            workflow_aggs = self.get_batch_aggregations()
            if len(workflow_aggs.object_count) > 0:
                agg_id = workflow_aggs.next().id
                current_aggregation = Aggregation.find(agg_id)
                if delete_if_exists:
                    current_aggregation.delete()
                    return self._create_agg(_user_id)
                else:
                    return current_aggregation
            else:
                return self._create_agg(_user_id)
        except PanoptesAPIException as e:
            raise e
I haven't fully tested this out, so I think that would work? And I might have some typos here and there cuz I'm just free coding 😆 but I think the code sniffs a bit better this way.

Thanks for this @yuenmichelle1, makes alot of sense to rely on the existing methods on the PanoptesObject. I have reimplemented to reflect this change so you can re-review. cc @zwolf @lcjohnso

yuenmichelle1

Hey Toyosi!
I had a couple small things but overall looks pretty good.

yuenmichelle1 · 2024-09-18T16:38:32Z

panoptes_client/workflow.py

-
+from panoptes_client.user import User
+from panoptes_client.aggregation import Aggregation
+import six


Quick question: looking into six package, it's a python 2/3 compatibility library. Do we need this?

yuenmichelle1 · 2024-09-18T22:30:28Z

panoptes_client/workflow.py

+    def _get_agg_property(self, param):
+        try:
+            aggs = self.get_batch_aggregations()
+            next = six.next(aggs)


Possibly stylistic, but I'm not the biggest fan of declaring a variable next, mainly because i worry it rewrites Python's built-in next method. We could rename this variable to agg or something.

Though, I think we could get away with
return getattr(next(aggs), param, None) or something like that

implemet batch aggregation calls on workflow. With tests

eae281b

Tooyosi requested review from zwolf, lcjohnso and yuenmichelle1 August 14, 2024 09:31

Tooyosi added 2 commits August 14, 2024 10:45

hound fixes

5a90961

fix failing test

5f596f0

yuenmichelle1 removed the request for review from zwolf September 5, 2024 14:32

yuenmichelle1 requested changes Sep 5, 2024

View reviewed changes

yuenmichelle1 reviewed Sep 5, 2024

View reviewed changes

yuenmichelle1 reviewed Sep 6, 2024

View reviewed changes

refactor logic to use PanoptesObject and add tests

51ac389

Tooyosi requested a review from yuenmichelle1 September 10, 2024 05:31

Tooyosi added 2 commits September 10, 2024 07:32

clean hound sniffs

46b396f

remove set_attr usage

c795df1

Tooyosi requested a review from zwolf September 10, 2024 06:36

remove unanted mock result

731f4aa

yuenmichelle1 reviewed Sep 18, 2024

View reviewed changes

remove declared next variable

cd657af

Tooyosi requested a review from yuenmichelle1 September 19, 2024 16:01

yuenmichelle1 approved these changes Sep 19, 2024

View reviewed changes

Tooyosi merged commit 1962b46 into master Sep 20, 2024
4 checks passed

Tooyosi deleted the batch-agg-client-implimentation branch September 20, 2024 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implemet batch aggregation on the client #318

implemet batch aggregation on the client #318

Tooyosi commented Aug 14, 2024 •

edited

Loading

yuenmichelle1 commented Sep 5, 2024

yuenmichelle1 left a comment •

edited

Loading

yuenmichelle1 Sep 5, 2024

zwolf commented Sep 5, 2024

yuenmichelle1 Sep 5, 2024

yuenmichelle1 Sep 6, 2024

yuenmichelle1 commented Sep 9, 2024 •

edited

Loading

Tooyosi commented Sep 10, 2024

yuenmichelle1 left a comment

yuenmichelle1 Sep 18, 2024

yuenmichelle1 Sep 18, 2024

implemet batch aggregation on the client #318

implemet batch aggregation on the client #318

Conversation

Tooyosi commented Aug 14, 2024 • edited Loading

yuenmichelle1 commented Sep 5, 2024

yuenmichelle1 left a comment • edited Loading

Choose a reason for hiding this comment

yuenmichelle1 Sep 5, 2024

Choose a reason for hiding this comment

zwolf commented Sep 5, 2024

yuenmichelle1 Sep 5, 2024

Choose a reason for hiding this comment

yuenmichelle1 Sep 6, 2024

Choose a reason for hiding this comment

yuenmichelle1 commented Sep 9, 2024 • edited Loading

Tooyosi commented Sep 10, 2024

yuenmichelle1 left a comment

Choose a reason for hiding this comment

yuenmichelle1 Sep 18, 2024

Choose a reason for hiding this comment

yuenmichelle1 Sep 18, 2024

Choose a reason for hiding this comment

Tooyosi commented Aug 14, 2024 •

edited

Loading

yuenmichelle1 left a comment •

edited

Loading

yuenmichelle1 commented Sep 9, 2024 •

edited

Loading