-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new readers and schemas for reduced data storage in db #437
Conversation
WalkthroughThe changes primarily focus on restructuring the data ingestion process in the Aeon project. The Changes
Uplevel your code reviews with CodeRabbit ProCodeRabbit ProIf you like this project, please support us by purchasing the Pro version. The Pro version has advanced context, superior noise reduction and several proprietary improvements compared to the open source version. Moreover, CodeRabbit Pro is free for open source projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files selected (2)
- aeon/io/reader.py (2)
- aeon/schema/ingestion_schemas.py (1)
Review comments generated (7)
- Review: 5
- LGTM: 2
Tips
Chat with CodeRabbit Bot (@coderabbitai
)
- Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
- Invite the bot into a review comment chain by tagging
@coderabbitai
in a reply.
Code suggestions
- The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
- You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.
Pausing incremental reviews
- Add
@coderabbitai: ignore
anywhere in the PR description to pause further reviews from the bot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files selected (1)
- tests/io/test_api.py (3)
Review comments generated (3)
- Review: 2
- LGTM: 1
Tips
Chat with CodeRabbit Bot (@coderabbitai
)
- Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
- Invite the bot into a review comment chain by tagging
@coderabbitai
in a reply.
Code suggestions
- The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
- You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.
Pausing incremental reviews
- Add
@coderabbitai: ignore
anywhere in the PR description to pause further reviews from the bot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files selected (1)
- tests/io/test_api.py (2)
Review comments generated (6)
- Review: 4
- LGTM: 2
Tips
Chat with CodeRabbit Bot (@coderabbitai
)
- Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
- Invite the bot into a review comment chain by tagging
@coderabbitai
in a reply.
Code suggestions
- The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
- You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.
Pausing incremental reviews
- Add
@coderabbitai: ignore
anywhere in the PR description to pause further reviews from the bot.
from aeon.schema.ingestion_schemas import social03 | ||
from aeon.schema.schemas import exp02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the function test_load_start_only()
, there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.
data = aeon.load(
nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49")
)
+ assert data is not None, "Data loading failed"
assert len(data) > 0
nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, in the function test_load_end_only()
, there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.
data = aeon.load(
nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49")
)
+ assert data is not None, "Data loading failed"
assert len(data) > 0
DOWNSAMPLE_PERIOD = 0.02 | ||
data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=True) | ||
raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=None) | ||
data = aeon.load(monotonic_path, social03.Patch2.Encoder) | ||
raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the function test_load_encoder_with_downsampling()
, the variable DOWNSAMPLE_PERIOD
is defined but not used in the subsequent code. This can lead to confusion and should be removed if not necessary.
- DOWNSAMPLE_PERIOD = 0.02
Also, similar to previous comments, there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.
data = aeon.load(monotonic_path, social03.Patch2.Encoder)
+ assert data is not None, "Data loading failed"
raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder)
+ assert raw_data is not None, "Raw data loading failed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files selected (1)
- aeon/schema/ingestion_schemas.py (1)
Review comments generated (4)
- Review: 4
- LGTM: 0
Tips
Chat with CodeRabbit Bot (@coderabbitai
)
- Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
- Invite the bot into a review comment chain by tagging
@coderabbitai
in a reply.
Code suggestions
- The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
- You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.
Pausing incremental reviews
- Add
@coderabbitai: ignore
anywhere in the PR description to pause further reviews from the bot.
def read(self, file: PathLike[str], sr_hz: int = 50) -> pd.DataFrame: | ||
"""Reads encoder data from the specified Harp binary file.""" | ||
data = super().read(file) | ||
data.index = aeon_time(data.index) | ||
first_index = data.first_valid_index() | ||
if first_index is not None: | ||
data = data.resample(f"{1/sr_hz}s").first() # take first sample in each resampled bin | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read
method in the _Encoder
class is downsampling the data by taking the first sample in each resampled bin. This approach might lead to loss of important information if the first sample is not representative of the entire bin. Consider using a different resampling strategy, such as averaging the samples in each bin, to preserve more information.
- data = data.resample(f"{1/sr_hz}s").first() # take first sample in each resampled bin
+ data = data.resample(f"{1/sr_hz}s").mean() # take average of samples in each resampled bin
def read(self, file: PathLike[str], drop_cols=None) -> pd.DataFrame: | ||
"""Reads video metadata from the specified file.""" | ||
drop_cols = ["hw_counter", "_frame", "_path", "_epoch"] if drop_cols is None else drop_cols | ||
data = pd.read_csv(file, header=0, names=self._rawcolumns) | ||
data.drop(columns=drop_cols, errors="ignore", inplace=True) | ||
data.set_index("time", inplace=True) | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the read
method of the _Video
class, you are dropping columns from the DataFrame without checking if they exist. This could lead to errors if one or more of the specified columns are not present in the DataFrame. Consider adding a check to ensure that only existing columns are dropped.
- data.drop(columns=drop_cols, errors="ignore", inplace=True)
+ drop_cols = [col for col in drop_cols if col in data.columns]
+ data.drop(columns=drop_cols, inplace=True)
def __init__(self, pattern): | ||
super().__init__(_Video(f"{pattern}_*")) | ||
|
||
|
||
class Encoder(Stream): | ||
"""Wheel magnetic encoder data.""" | ||
|
||
def __init__(self, pattern): | ||
super().__init__(_Encoder(f"{pattern}_90_*")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __init__
methods in the Video
and Encoder
classes are hardcoding the pattern for file matching. This reduces the flexibility of these classes and makes them less reusable. Consider passing the pattern as an argument to the __init__
method instead.
- def __init__(self, pattern):
- super().__init__(_Video(f"{pattern}_*"))
+ def __init__(self, pattern, file_pattern):
+ super().__init__(_Video(f"{pattern}_{file_pattern}"))
- def __init__(self, pattern):
- super().__init__(_Encoder(f"{pattern}_90_*"))
+ def __init__(self, pattern, file_pattern):
+ super().__init__(_Encoder(f"{pattern}_{file_pattern}"))
) | ||
|
||
|
||
__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __all__
variable is used to specify the public interface of a module. However, it's not clear from this code whether all the items listed in __all__
are intended to be part of the public interface. Please ensure that only those items that should be accessible when importing the module using from module import *
are included in __all__
.
-__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"]
+__all__ = ["intended_public_interface_item1", "intended_public_interface_item2", ...]
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #437 +/- ##
==========================================
+ Coverage 18.77% 19.41% +0.63%
==========================================
Files 37 38 +1
Lines 3009 3039 +30
==========================================
+ Hits 565 590 +25
- Misses 2444 2449 +5 ☔ View full report in Codecov by Sentry. |
Closing this since this was done in #438 |
Summary by CodeRabbit
read
method fromEncoder
class and adjusted function calls and imports for better code organization and maintainability.