Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Social schema #310

Closed
wants to merge 16 commits into from
Closed

Social schema #310

wants to merge 16 commits into from

Conversation

jkbhagatio
Copy link
Member

No description provided.

@@ -5,7 +5,7 @@
import pandas as pd

from aeon.io import api as io_api
from aeon.schema import dataset as aeon_schema
from aeon.io import schemas as aeon_schema
Copy link
Contributor

@glopesdev glopesdev Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should lump together experiment-specific schemas with the much more general io module.

Currently somebody can use io as a PyPI package for their own experiments without having to know anything at all about the details of the foraging experiments, which I think is something we should keep. We can discuss whether aeon.schema is the best name for the module, but I do feel these should live in a separate module.

@@ -115,7 +115,8 @@ def load(root, reader, start=None, end=None, time=None, tolerance=None, epoch=No
# to fill missing values
previous = reader.read(files[i - 1])
data = pd.concat([previous, frame])
data = data.reindex(values, method="pad", tolerance=tolerance)
data = data.reindex(values, tolerance=tolerance)
data.dropna(inplace=True)
Copy link
Contributor

@glopesdev glopesdev Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping NaN values from the output data? This feels dangerous, but regardless of the reason seems out of scope for this PR which is about refactoring schemas. If we want to discuss this we should do so in a separate PR.

def compositeStream(pattern, *args):
"""Merges multiple data streams into a single composite stream."""
composite = {}
def register(pattern, *args):
Copy link
Contributor

@glopesdev glopesdev Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think register is a good term to use here. In computer-science this term refers either to a CPU register or something like the Windows registry, which is more like a global level dictionary of settings.

If the issue is the term composite, then my proposed alternative would be to simply use the plural streams, since conceptually a Device is simply meant to contain a collection of data streams.

I think we discussed this in a previous DA meeting, but there is a confusion here about the intended target audience. At a basic data analysis level I think the goal is to simply provide access to the collection of streams in a device. That collection is a dictionary simply because there is a unique key associated with each stream, but what we have at its heart is still simply a collection of streams.

I do agree with you that the "binder function" (provisional name) has a different role from a Reader, and likely merits having its own name separate name when we are explaining how the API works.

However, I was realizing just now that we actually don't seem to use the word "stream" for anything else in the API so it still feels to me like a strong contender to capture this concept.

"""Merges multiple data streams into a single composite stream."""
composite = {}
def register(pattern, *args):
"""Merges multiple Readers into a single registry."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not merging Readers but actually binder_fn objects, at least given the code below.



class Device:
"""Groups multiple data streams into a logical device.
"""Groups multiple Readers into a logical device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we are not merging Readers but binder_fn objects.

Comment on lines +21 to +22
If a device contains a single stream reader with the same pattern as the device `name`, it will be
considered a singleton, and the stream reader will be paired directly with the device without nesting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly changed here? If we are reviewing core terminology we should make sure documentation stays consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like auto-formatting


Attributes:
name (str): Name of the device.
args (Any): Data streams collected from the device.
args (any): A binder function or class that returns a dictionary of Readers.
Copy link
Contributor

@glopesdev glopesdev Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any here meant the Any type so it should either be left with its original casing, or replaced with something else from the typing module.

A possible alternative to "binder function" could be "stream selector" or "stream accessor", since basically the only functionality this function brings in addition to the reader is the pattern to search and find stream data files.

Another option is simply to call this concept a "stream", since we don't use that name anywhere else in the API. I think this is actually closer to what I originally had in mind when I defined these "binder functions":

  • a Reader is a reusable module for reading specific data file formats
  • a stream is a combination of "data" + "reader", i.e. where is the data (the "pattern") and how to read it (the Reader).

We can discuss better on a future meeting.

@@ -212,7 +212,7 @@ def read(self, file):
specified unique identifier.
"""
data = super().read(file)
data = data[data.event & self.value > 0]
data = data[(data.event & self.value) == self.value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember anymore what this change was about, but I feel we should leave any functional changes outside of this PR, which will already be confusing enough with all the major breaking refactoring and reorganisation.

@@ -37,7 +37,7 @@ def subject_state(pattern):
return {"SubjectState": _reader.Subject(f"{pattern}_SubjectState_*")}


def messageLog(pattern):
def message_log(pattern):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we focus this PR only on the major naming changes and terminology? I know we disagree on best practices for casing in python, but if we could leave this out of this PR so we can focus on the core discussions on naming, that would help a lot.

@@ -32,7 +32,7 @@ def read(
) -> pd.DataFrame:
"""Reads data from the Harp-binarized tracking file."""
# Get config file from `file`, then bodyparts from config file.
model_dir = Path(file.stem.replace("_", "/")).parent
model_dir = Path(*Path(file.stem.replace("_", "/")).parent.parts[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I would remove all functional changes from this PR.

@jkbhagatio
Copy link
Member Author

Closing as the schema has been incorporated in #347

@jkbhagatio jkbhagatio closed this Apr 4, 2024
@jkbhagatio jkbhagatio deleted the social_schema branch November 14, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants