Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemet batch aggregation on the client #318

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

Tooyosi
Copy link
Contributor

@Tooyosi Tooyosi commented Aug 14, 2024

Implementation of batch aggregation on the Workflows class.

Batch Aggregation guide: https://github.com/orgs/zooniverse/projects/44/views/2?sliceBy%5BcolumnId%5D=74376894&pane=issue&itemId=70649021

You can test with Workflow(3819).run_batch_aggregation(1326469)

@yuenmichelle1
Copy link
Collaborator

Hey
@Tooyosi and @lcjohnso . I am just now looking at this. Off the bat I have a big question that maybe either of you should be able to answer.

Is there a reason why we are not having Aggregation inherit from PanoptesObject? (Knowing that Aggregations is a table and endpoint on Panoptes)

I think we can save a lot of code if we do something like

class Aggregation(PanoptesObject):
  _api_slug = 'aggregations'
  _link_slug = 'aggregations'
 _edit_attributes = (
     EDITABLE ATTRIBUTES HERE
)
end

So that instead of BatchAggregation.get_aggregations(workflow_id) we can utilize Aggregation.where(workflow_id=workflow_id) and instead of BatchAggregation.get_aggregation(id) we can utilize Aggregation.find(id) (where and find coming from PanoptesObject)
same thing for delete and create etc

The only custom aggregation method would possibly be run_aggregation.

@yuenmichelle1 yuenmichelle1 removed the request for review from zwolf September 5, 2024 14:32
Copy link
Collaborator

@yuenmichelle1 yuenmichelle1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Toyosi! I had a couple of questions on this PR that maybe you or Cliff can answer. Also don't forget to clear out any hound sniifs :)

@@ -24,6 +25,27 @@ def setUp(self):
self.addCleanup(caesar_get_patch.stop)
self.addCleanup(caesar_put_patch.stop)

batch_agg_run_aggregation_patch = patch.object(BatchAggregation, 'run_aggregation')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the batch agg patches are so far away from the actual related tests, its easy to forget about them existing.

I wonder if its worth having a separate class within the same file for these changes and rename the ones geared towards caesar to separate.

i.e.

class TestWorkflowCaesarFeatures(unittest.TestCase):
    def setUp(self):
        caesar test setup stuff 

class TestWorkflowRunAggregation(unittest.TestCase):
     def setUp(self):
        BATCH AGG SETUP STUFF
      
     def test_run_aggregation_stuff(self):

Another way to mitigate (other than splitting tests into classes ) is to patch on the test. eg

@patch('run_aggregation', return_value=EXPECTED RETURN VALUE)
def test_run_aggregation(self, mock_run_agregation):
   WHATEVER TEST

@zwolf
Copy link
Member

zwolf commented Sep 5, 2024

Is there a reason why we are not having Aggregation inherit from PanoptesObject? (Knowing that Aggregations is a table and endpoint on Panoptes)

Not really: the goal was to be clear about how Aggregations (the Panoptes resource) are intended to be used by developers. That was my first idea too, and after the conversation we just had it seems like it would be more straightforward to implement Aggregations as a model in the client that inherits from PanoptesObject. That would provide the get/post/delete methods required by the run_aggregation method and eliminate the re-implementation in batch_aggregation.py.

In a refactor, I would start by creating a new class Aggregation(PanoptesObject). That'd give you a lot of what's in batch_aggregation.py by default. This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly. run_aggregation can then be moved directly onto the Workflow model and refactored to use the Aggregation model's API methods. Probably makes sense to move the status check & link generation logic to the aggregation model also. At that point, batch_aggregation.py can be removed and the specs refactored to test the new model, workflow model specs probably won't change much.

@@ -13,3 +13,4 @@
from panoptes_client.subject_workflow_status import SubjectWorkflowStatus
from panoptes_client.caesar import Caesar
from panoptes_client.inaturalist import Inaturalist
from panoptes_client.batch_aggregation import BatchAggregation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that if we want users to only interact with aggregations via the Workflow, we probably do not need this line.

"""
This method will fetch existing aggregation status if any.
"""
return self.get_batch_aggregations()['aggregations'][0]['status']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a check if there are no aggregations. i.e.
when self.get_batch_aggregations()['aggregations'] length of resulting array is 0.

Same with get_batch_aggregation_links (method below on L578)

@yuenmichelle1
Copy link
Collaborator

yuenmichelle1 commented Sep 9, 2024

Re: zach's comment:

This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly.

I don't entirely agree with this ^, since we want to be able to create an aggregation in some cases through workflow's run_batch_aggregation.

I think what we could do is...

  1. Have a simple Aggregation class that can look like the following in a aggregation.py file (You can call it BatchAggregation if you want. I think my preference is to match the panoptes table name.)
from panoptes_client.panoptes import PanoptesObject

class Aggregation(PanoptesObject):
    _api_slug = 'aggregations'
    _link_slug = 'aggregations'
    _edit_attributes = (
        {
            'links': (
                'workflow',
                'user',
            )
        },
    )
  ## NOTE: that the ,s (commas) are important on the edit attributes links ^ 

Have this class be only imported in workflow.py but do not make it avail through __init__.py, which will cover the fact that we do not want to have a user edit an aggregation "directly".

  1. In workflow.py have a private method called _create_aggregation that probably can look like the following:
def _create_agg(self, user_id):
        new_aggregation = Aggregation()
        new_aggregation.links.workflow = self.id
        new_aggregation.links.user = user_id
        new_aggregation.save()
        return new_aggregation
  1. workflow.py's get_batch_aggregations can look as simple as:
    return Aggregation.where(workflow_id=self.id)
    ^ .where for a PanoptesObject returns a ResultPaginator instance so you'll want to access the first Aggregation that comes from this query through:
    .next() and check for results length via .object_count

  2. Then in workflow.py's run_batch_aggregation can look like:

 def run_batch_aggregation(self, user=None, delete_if_exists=False):
       TYPE CHECK FOR USER /SETTING _user_id GOES HERE
        try:
            workflow_aggs = self.get_batch_aggregations()
            if len(workflow_aggs.object_count) > 0:
                agg_id = workflow_aggs.next().id
                current_aggregation = Aggregation.find(agg_id)
                if delete_if_exists:
                    current_aggregation.delete()
                    return self._create_agg(_user_id)
                else:
                    return current_aggregation
            else:
                return self._create_agg(_user_id)
        except PanoptesAPIException as e:
            raise e

I haven't fully tested this out, so I think that would work? And I might have some typos here and there cuz I'm just free coding 😆 but I think the code sniffs a bit better this way.

@Tooyosi Tooyosi requested a review from zwolf September 10, 2024 06:36
@Tooyosi
Copy link
Contributor Author

Tooyosi commented Sep 10, 2024

Re: zach's comment:

This model's _edit_attributes should be empty, as the client shouldn't be doing any updating of this resource directly.

I don't entirely agree with this ^, since we want to be able to create an aggregation in some cases through workflow's run_batch_aggregation.

I think what we could do is...

  1. Have a simple Aggregation class that can look like the following in a aggregation.py file (You can call it BatchAggregation if you want. I think my preference is to match the panoptes table name.)
from panoptes_client.panoptes import PanoptesObject

class Aggregation(PanoptesObject):
    _api_slug = 'aggregations'
    _link_slug = 'aggregations'
    _edit_attributes = (
        {
            'links': (
                'workflow',
                'user',
            )
        },
    )
  ## NOTE: that the ,s (commas) are important on the edit attributes links ^ 

Have this class be only imported in workflow.py but do not make it avail through __init__.py, which will cover the fact that we do not want to have a user edit an aggregation "directly".

  1. In workflow.py have a private method called _create_aggregation that probably can look like the following:
def _create_agg(self, user_id):
        new_aggregation = Aggregation()
        new_aggregation.links.workflow = self.id
        new_aggregation.links.user = user_id
        new_aggregation.save()
        return new_aggregation
  1. workflow.py's get_batch_aggregations can look as simple as:
    return Aggregation.where(workflow_id=self.id)
    ^ .where for a PanoptesObject returns a ResultPaginator instance so you'll want to access the first Aggregation that comes from this query through:
    .next() and check for results length via .object_count
  2. Then in workflow.py's run_batch_aggregation can look like:
 def run_batch_aggregation(self, user=None, delete_if_exists=False):
       TYPE CHECK FOR USER /SETTING _user_id GOES HERE
        try:
            workflow_aggs = self.get_batch_aggregations()
            if len(workflow_aggs.object_count) > 0:
                agg_id = workflow_aggs.next().id
                current_aggregation = Aggregation.find(agg_id)
                if delete_if_exists:
                    current_aggregation.delete()
                    return self._create_agg(_user_id)
                else:
                    return current_aggregation
            else:
                return self._create_agg(_user_id)
        except PanoptesAPIException as e:
            raise e

I haven't fully tested this out, so I think that would work? And I might have some typos here and there cuz I'm just free coding 😆 but I think the code sniffs a bit better this way.

Thanks for this @yuenmichelle1, makes alot of sense to rely on the existing methods on the PanoptesObject. I have reimplemented to reflect this change so you can re-review. cc @zwolf @lcjohnso

Copy link
Collaborator

@yuenmichelle1 yuenmichelle1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Toyosi!
I had a couple small things but overall looks pretty good.


from panoptes_client.user import User
from panoptes_client.aggregation import Aggregation
import six
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: looking into six package, it's a python 2/3 compatibility library. Do we need this?

def _get_agg_property(self, param):
try:
aggs = self.get_batch_aggregations()
next = six.next(aggs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly stylistic, but I'm not the biggest fan of declaring a variable next, mainly because i worry it rewrites Python's built-in next method. We could rename this variable to agg or something.

Though, I think we could get away with
return getattr(next(aggs), param, None) or something like that

@Tooyosi Tooyosi merged commit 1962b46 into master Sep 20, 2024
4 checks passed
@Tooyosi Tooyosi deleted the batch-agg-client-implimentation branch September 20, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants