Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new readers and schemas for reduced data storage in db #437

Closed
wants to merge 4 commits into from

Conversation

jkbhagatio
Copy link
Member

@jkbhagatio jkbhagatio commented Oct 10, 2024

Summary by CodeRabbit

  • New Feature: Introduced new readers for encoder and video data, enhancing the system's ability to process diverse data types.
  • New Feature: Defined new streams and stream groups, providing more granular control over data processing.
  • New Feature: Added schemas for different experimental setups, allowing for greater flexibility in data ingestion.
  • New Feature: Implemented new readers and schemas for optimized data storage, improving database performance.
  • Refactor: Removed read method from Encoder class and adjusted function calls and imports for better code organization and maintainability.
  • New Feature: Established new classes for reading encoder and video data, enhancing modularity and readability of the code.

Copy link

github-actions bot commented Oct 10, 2024

Image description CodeRabbit

Walkthrough

The changes primarily focus on restructuring the data ingestion process in the Aeon project. The read method from the Encoder class has been removed, potentially affecting how encoder data is read and processed. New readers for encoder and video data have been introduced, along with new streams and stream groups. Schemas for different experimental setups have been defined and exported for external use. The changes also include new readers and schemas for reduced data storage, adjustments in function calls and imports, and new classes for reading encoder and video data.

Changes

Files Summary
aeon/io/reader.py, aeon/schema/ingestion_schemas.py, tests/io/test_api.py These files have undergone significant structural changes. The read method from the Encoder class has been removed, which could impact the way encoder data is read and processed. New readers for encoder and video data have been introduced, along with new streams and stream groups. Schemas for different experimental setups have been defined and are now available for external use. Additionally, these changes introduce new readers and schemas for reduced data storage in the database. The code changes involve significant additions and modifications that may impact the overall behavior and structure of the system.

Uplevel your code reviews with CodeRabbit Pro

CodeRabbit Pro

If you like this project, please support us by purchasing the Pro version. The Pro version has advanced context, superior noise reduction and several proprietary improvements compared to the open source version. Moreover, CodeRabbit Pro is free for open source projects.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

Commits Files that changed from the base of the PR and between 83cd905 and b54e1c3 commits.
Files selected (2)
  • aeon/io/reader.py (2)
  • aeon/schema/ingestion_schemas.py (1)
Review comments generated (7)
  • Review: 5
  • LGTM: 2

Tips

Chat with Image description CodeRabbit Bot (@coderabbitai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

  • Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

aeon/io/reader.py Show resolved Hide resolved
aeon/schema/ingestion_schemas.py Outdated Show resolved Hide resolved
aeon/schema/ingestion_schemas.py Show resolved Hide resolved
@jkbhagatio jkbhagatio linked an issue Oct 10, 2024 that may be closed by this pull request
1 task
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

Commits Files that changed from the base of the PR and between b54e1c3 and d6cf52f commits.
Files selected (1)
  • tests/io/test_api.py (3)
Review comments generated (3)
  • Review: 2
  • LGTM: 1

Tips

Chat with Image description CodeRabbit Bot (@coderabbitai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

  • Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

tests/io/test_api.py Show resolved Hide resolved
tests/io/test_api.py Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

Commits Files that changed from the base of the PR and between d6cf52f and f12e359 commits.
Files selected (1)
  • tests/io/test_api.py (2)
Review comments generated (6)
  • Review: 4
  • LGTM: 2

Tips

Chat with Image description CodeRabbit Bot (@coderabbitai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

  • Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

Comment on lines +8 to 9
from aeon.schema.ingestion_schemas import social03
from aeon.schema.schemas import exp02

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

The import of social03 from aeon.schema.ingestion_schemas is not used in the code, which can lead to unnecessary overhead and confusion. Please remove unused imports to improve code readability and performance.

- from aeon.schema.ingestion_schemas import social03

Comment on lines +18 to 19
nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49")
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

In the function test_load_start_only(), there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

    data = aeon.load(
       nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49")
    )
+    assert data is not None, "Data loading failed"
    assert len(data) > 0

Comment on lines +26 to 27
nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49")
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

Similar to the previous comment, in the function test_load_end_only(), there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

    data = aeon.load(
       nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49")
    )
+    assert data is not None, "Data loading failed"
    assert len(data) > 0

Comment on lines 52 to +54
DOWNSAMPLE_PERIOD = 0.02
data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=True)
raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=None)
data = aeon.load(monotonic_path, social03.Patch2.Encoder)
raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

In the function test_load_encoder_with_downsampling(), the variable DOWNSAMPLE_PERIOD is defined but not used in the subsequent code. This can lead to confusion and should be removed if not necessary.

- DOWNSAMPLE_PERIOD = 0.02

Also, similar to previous comments, there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

    data = aeon.load(monotonic_path, social03.Patch2.Encoder)
+    assert data is not None, "Data loading failed"
    raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder)
+    assert raw_data is not None, "Raw data loading failed"

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

Commits Files that changed from the base of the PR and between f12e359 and daf6224 commits.
Files selected (1)
  • aeon/schema/ingestion_schemas.py (1)
Review comments generated (4)
  • Review: 4
  • LGTM: 0

Tips

Chat with Image description CodeRabbit Bot (@coderabbitai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

  • Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

Comment on lines +20 to +27
def read(self, file: PathLike[str], sr_hz: int = 50) -> pd.DataFrame:
"""Reads encoder data from the specified Harp binary file."""
data = super().read(file)
data.index = aeon_time(data.index)
first_index = data.first_valid_index()
if first_index is not None:
data = data.resample(f"{1/sr_hz}s").first() # take first sample in each resampled bin
return data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

The read method in the _Encoder class is downsampling the data by taking the first sample in each resampled bin. This approach might lead to loss of important information if the first sample is not representative of the entire bin. Consider using a different resampling strategy, such as averaging the samples in each bin, to preserve more information.

-        data = data.resample(f"{1/sr_hz}s").first()  # take first sample in each resampled bin
+        data = data.resample(f"{1/sr_hz}s").mean()  # take average of samples in each resampled bin

Comment on lines +33 to +39
def read(self, file: PathLike[str], drop_cols=None) -> pd.DataFrame:
"""Reads video metadata from the specified file."""
drop_cols = ["hw_counter", "_frame", "_path", "_epoch"] if drop_cols is None else drop_cols
data = pd.read_csv(file, header=0, names=self._rawcolumns)
data.drop(columns=drop_cols, errors="ignore", inplace=True)
data.set_index("time", inplace=True)
return data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

In the read method of the _Video class, you are dropping columns from the DataFrame without checking if they exist. This could lead to errors if one or more of the specified columns are not present in the DataFrame. Consider adding a check to ensure that only existing columns are dropped.

-        data.drop(columns=drop_cols, errors="ignore", inplace=True)
+        drop_cols = [col for col in drop_cols if col in data.columns]
+        data.drop(columns=drop_cols, inplace=True)

Comment on lines +46 to +54
def __init__(self, pattern):
super().__init__(_Video(f"{pattern}_*"))


class Encoder(Stream):
"""Wheel magnetic encoder data."""

def __init__(self, pattern):
super().__init__(_Encoder(f"{pattern}_90_*"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

The __init__ methods in the Video and Encoder classes are hardcoding the pattern for file matching. This reduces the flexibility of these classes and makes them less reusable. Consider passing the pattern as an argument to the __init__ method instead.

-    def __init__(self, pattern):
-        super().__init__(_Video(f"{pattern}_*"))
+    def __init__(self, pattern, file_pattern):
+        super().__init__(_Video(f"{pattern}_{file_pattern}"))

-    def __init__(self, pattern):
-        super().__init__(_Encoder(f"{pattern}_90_*"))
+    def __init__(self, pattern, file_pattern):
+        super().__init__(_Encoder(f"{pattern}_{file_pattern}"))

)


__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image description CodeRabbit

The __all__ variable is used to specify the public interface of a module. However, it's not clear from this code whether all the items listed in __all__ are intended to be part of the public interface. Please ensure that only those items that should be accessible when importing the module using from module import * are included in __all__.

-__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"]
+__all__ = ["intended_public_interface_item1", "intended_public_interface_item2", ...]

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 87.80488% with 5 lines in your changes missing coverage. Please review.

Project coverage is 19.41%. Comparing base (83cd905) to head (daf6224).

Files with missing lines Patch % Lines
aeon/schema/ingestion_schemas.py 87.50% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
+ Coverage   18.77%   19.41%   +0.63%     
==========================================
  Files          37       38       +1     
  Lines        3009     3039      +30     
==========================================
+ Hits          565      590      +25     
- Misses       2444     2449       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jkbhagatio
Copy link
Member Author

Closing this since this was done in #438

@jkbhagatio jkbhagatio closed this Nov 6, 2024
@jkbhagatio jkbhagatio deleted the ingestion_readers_schemas branch November 14, 2024 11:41
@jkbhagatio jkbhagatio mentioned this pull request Nov 14, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Database storage concerns
1 participant