-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Social schema #310
Social schema #310
Conversation
@@ -5,7 +5,7 @@ | |||
import pandas as pd | |||
|
|||
from aeon.io import api as io_api | |||
from aeon.schema import dataset as aeon_schema | |||
from aeon.io import schemas as aeon_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should lump together experiment-specific schemas with the much more general io
module.
Currently somebody can use io
as a PyPI package for their own experiments without having to know anything at all about the details of the foraging experiments, which I think is something we should keep. We can discuss whether aeon.schema
is the best name for the module, but I do feel these should live in a separate module.
@@ -115,7 +115,8 @@ def load(root, reader, start=None, end=None, time=None, tolerance=None, epoch=No | |||
# to fill missing values | |||
previous = reader.read(files[i - 1]) | |||
data = pd.concat([previous, frame]) | |||
data = data.reindex(values, method="pad", tolerance=tolerance) | |||
data = data.reindex(values, tolerance=tolerance) | |||
data.dropna(inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping NaN
values from the output data? This feels dangerous, but regardless of the reason seems out of scope for this PR which is about refactoring schemas. If we want to discuss this we should do so in a separate PR.
def compositeStream(pattern, *args): | ||
"""Merges multiple data streams into a single composite stream.""" | ||
composite = {} | ||
def register(pattern, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think register
is a good term to use here. In computer-science this term refers either to a CPU register or something like the Windows registry, which is more like a global level dictionary of settings.
If the issue is the term composite, then my proposed alternative would be to simply use the plural streams
, since conceptually a Device
is simply meant to contain a collection of data streams.
I think we discussed this in a previous DA meeting, but there is a confusion here about the intended target audience. At a basic data analysis level I think the goal is to simply provide access to the collection of streams in a device. That collection is a dictionary simply because there is a unique key associated with each stream, but what we have at its heart is still simply a collection of streams.
I do agree with you that the "binder function" (provisional name) has a different role from a Reader
, and likely merits having its own name separate name when we are explaining how the API works.
However, I was realizing just now that we actually don't seem to use the word "stream" for anything else in the API so it still feels to me like a strong contender to capture this concept.
"""Merges multiple data streams into a single composite stream.""" | ||
composite = {} | ||
def register(pattern, *args): | ||
"""Merges multiple Readers into a single registry.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not merging Readers
but actually binder_fn
objects, at least given the code below.
|
||
|
||
class Device: | ||
"""Groups multiple data streams into a logical device. | ||
"""Groups multiple Readers into a logical device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we are not merging Readers
but binder_fn
objects.
If a device contains a single stream reader with the same pattern as the device `name`, it will be | ||
considered a singleton, and the stream reader will be paired directly with the device without nesting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly changed here? If we are reviewing core terminology we should make sure documentation stays consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like auto-formatting
|
||
Attributes: | ||
name (str): Name of the device. | ||
args (Any): Data streams collected from the device. | ||
args (any): A binder function or class that returns a dictionary of Readers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
here meant the Any type so it should either be left with its original casing, or replaced with something else from the typing module.
A possible alternative to "binder function" could be "stream selector" or "stream accessor", since basically the only functionality this function brings in addition to the reader is the pattern to search and find stream data files.
Another option is simply to call this concept a "stream", since we don't use that name anywhere else in the API. I think this is actually closer to what I originally had in mind when I defined these "binder functions":
- a
Reader
is a reusable module for reading specific data file formats - a
stream
is a combination of "data" + "reader", i.e. where is the data (the "pattern") and how to read it (theReader
).
We can discuss better on a future meeting.
@@ -212,7 +212,7 @@ def read(self, file): | |||
specified unique identifier. | |||
""" | |||
data = super().read(file) | |||
data = data[data.event & self.value > 0] | |||
data = data[(data.event & self.value) == self.value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember anymore what this change was about, but I feel we should leave any functional changes outside of this PR, which will already be confusing enough with all the major breaking refactoring and reorganisation.
@@ -37,7 +37,7 @@ def subject_state(pattern): | |||
return {"SubjectState": _reader.Subject(f"{pattern}_SubjectState_*")} | |||
|
|||
|
|||
def messageLog(pattern): | |||
def message_log(pattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we focus this PR only on the major naming changes and terminology? I know we disagree on best practices for casing in python, but if we could leave this out of this PR so we can focus on the core discussions on naming, that would help a lot.
@@ -32,7 +32,7 @@ def read( | |||
) -> pd.DataFrame: | |||
"""Reads data from the Harp-binarized tracking file.""" | |||
# Get config file from `file`, then bodyparts from config file. | |||
model_dir = Path(file.stem.replace("_", "/")).parent | |||
model_dir = Path(*Path(file.stem.replace("_", "/")).parent.parts[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I would remove all functional changes from this PR.
Closing as the schema has been incorporated in #347 |
No description provided.