new readers and schemas for reduced data storage in db #437

jkbhagatio · 2024-10-10T21:08:45Z

Summary by CodeRabbit

New Feature: Introduced new readers for encoder and video data, enhancing the system's ability to process diverse data types.
New Feature: Defined new streams and stream groups, providing more granular control over data processing.
New Feature: Added schemas for different experimental setups, allowing for greater flexibility in data ingestion.
New Feature: Implemented new readers and schemas for optimized data storage, improving database performance.
Refactor: Removed read method from Encoder class and adjusted function calls and imports for better code organization and maintainability.
New Feature: Established new classes for reading encoder and video data, enhancing modularity and readability of the code.

github-actions · 2024-10-10T21:08:59Z

CodeRabbit

Walkthrough

The changes primarily focus on restructuring the data ingestion process in the Aeon project. The read method from the Encoder class has been removed, potentially affecting how encoder data is read and processed. New readers for encoder and video data have been introduced, along with new streams and stream groups. Schemas for different experimental setups have been defined and exported for external use. The changes also include new readers and schemas for reduced data storage, adjustments in function calls and imports, and new classes for reading encoder and video data.

Changes

Files	Summary
aeon/io/reader.py, aeon/schema/ingestion_schemas.py, tests/io/test_api.py	These files have undergone significant structural changes. The `read` method from the `Encoder` class has been removed, which could impact the way encoder data is read and processed. New readers for encoder and video data have been introduced, along with new streams and stream groups. Schemas for different experimental setups have been defined and are now available for external use. Additionally, these changes introduce new readers and schemas for reduced data storage in the database. The code changes involve significant additions and modifications that may impact the overall behavior and structure of the system.

Uplevel your code reviews with CodeRabbit Pro

CodeRabbit Pro

If you like this project, please support us by purchasing the Pro version. The Pro version has advanced context, superior noise reduction and several proprietary improvements compared to the open source version. Moreover, CodeRabbit Pro is free for open source projects.

github-actions

CodeRabbit

Commits

Files that changed from the base of the PR and between 83cd905 and b54e1c3 commits.

Files selected (2)

aeon/io/reader.py (2)
aeon/schema/ingestion_schemas.py (1)

Review comments generated (7)

Review: 5
LGTM: 2

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

aeon/io/reader.py

aeon/schema/ingestion_schemas.py

github-actions

CodeRabbit

Commits

Files that changed from the base of the PR and between b54e1c3 and d6cf52f commits.

Files selected (1)

tests/io/test_api.py (3)

Review comments generated (3)

Review: 2
LGTM: 1

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

tests/io/test_api.py

github-actions

CodeRabbit

Commits

Files that changed from the base of the PR and between d6cf52f and f12e359 commits.

Files selected (1)

tests/io/test_api.py (2)

Review comments generated (6)

Review: 4
LGTM: 2

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

github-actions · 2024-10-10T23:13:58Z

tests/io/test_api.py

+from aeon.schema.ingestion_schemas import social03
 from aeon.schema.schemas import exp02


CodeRabbit

The import of social03 from aeon.schema.ingestion_schemas is not used in the code, which can lead to unnecessary overhead and confusion. Please remove unused imports to improve code readability and performance.

- from aeon.schema.ingestion_schemas import social03

github-actions · 2024-10-10T23:13:59Z

tests/io/test_api.py

+        nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49")
    )


CodeRabbit

In the function test_load_start_only(), there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

data = aeon.load( nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49") ) + assert data is not None, "Data loading failed" assert len(data) > 0

github-actions · 2024-10-10T23:13:59Z

tests/io/test_api.py

+        nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49")
    )


CodeRabbit

Similar to the previous comment, in the function test_load_end_only(), there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

data = aeon.load( nonmonotonic_path, exp02.Patch2.Encoder, end=pd.Timestamp("2022-06-06T13:00:49") ) + assert data is not None, "Data loading failed" assert len(data) > 0

github-actions · 2024-10-10T23:13:59Z

tests/io/test_api.py

    DOWNSAMPLE_PERIOD = 0.02
-    data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=True)
-    raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder, downsample=None)
+    data = aeon.load(monotonic_path, social03.Patch2.Encoder)
+    raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder)


CodeRabbit

In the function test_load_encoder_with_downsampling(), the variable DOWNSAMPLE_PERIOD is defined but not used in the subsequent code. This can lead to confusion and should be removed if not necessary.

- DOWNSAMPLE_PERIOD = 0.02

Also, similar to previous comments, there's no error handling for the case when the data loading fails and returns an empty dataset. This could lead to a misleading test result. Consider adding an explicit check for this scenario.

data = aeon.load(monotonic_path, social03.Patch2.Encoder) + assert data is not None, "Data loading failed" raw_data = aeon.load(monotonic_path, exp02.Patch2.Encoder) + assert raw_data is not None, "Raw data loading failed"

github-actions

CodeRabbit

Commits

Files that changed from the base of the PR and between f12e359 and daf6224 commits.

Files selected (1)

aeon/schema/ingestion_schemas.py (1)

Review comments generated (4)

Review: 4
LGTM: 0

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
Invite the bot into a review comment chain by tagging @coderabbitai in a reply.

Code suggestions

The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Pausing incremental reviews

Add @coderabbitai: ignore anywhere in the PR description to pause further reviews from the bot.

github-actions · 2024-10-10T23:35:36Z

aeon/schema/ingestion_schemas.py

+    def read(self, file: PathLike[str], sr_hz: int = 50) -> pd.DataFrame:
+        """Reads encoder data from the specified Harp binary file."""
+        data = super().read(file)
+        data.index = aeon_time(data.index)
+        first_index = data.first_valid_index()
+        if first_index is not None:
+            data = data.resample(f"{1/sr_hz}s").first()  # take first sample in each resampled bin
+        return data


CodeRabbit

The read method in the _Encoder class is downsampling the data by taking the first sample in each resampled bin. This approach might lead to loss of important information if the first sample is not representative of the entire bin. Consider using a different resampling strategy, such as averaging the samples in each bin, to preserve more information.

- data = data.resample(f"{1/sr_hz}s").first() # take first sample in each resampled bin + data = data.resample(f"{1/sr_hz}s").mean() # take average of samples in each resampled bin

github-actions · 2024-10-10T23:35:36Z

aeon/schema/ingestion_schemas.py

+    def read(self, file: PathLike[str], drop_cols=None) -> pd.DataFrame:
+        """Reads video metadata from the specified file."""
+        drop_cols = ["hw_counter", "_frame", "_path", "_epoch"] if drop_cols is None else drop_cols
+        data = pd.read_csv(file, header=0, names=self._rawcolumns)
+        data.drop(columns=drop_cols, errors="ignore", inplace=True)
+        data.set_index("time", inplace=True)
+        return data


CodeRabbit

In the read method of the _Video class, you are dropping columns from the DataFrame without checking if they exist. This could lead to errors if one or more of the specified columns are not present in the DataFrame. Consider adding a check to ensure that only existing columns are dropped.

- data.drop(columns=drop_cols, errors="ignore", inplace=True) + drop_cols = [col for col in drop_cols if col in data.columns] + data.drop(columns=drop_cols, inplace=True)

github-actions · 2024-10-10T23:35:36Z

aeon/schema/ingestion_schemas.py

+    def __init__(self, pattern):
+        super().__init__(_Video(f"{pattern}_*"))
+
+
+class Encoder(Stream):
+    """Wheel magnetic encoder data."""
+
+    def __init__(self, pattern):
+        super().__init__(_Encoder(f"{pattern}_90_*"))


CodeRabbit

The __init__ methods in the Video and Encoder classes are hardcoding the pattern for file matching. This reduces the flexibility of these classes and makes them less reusable. Consider passing the pattern as an argument to the __init__ method instead.

- def __init__(self, pattern): - super().__init__(_Video(f"{pattern}_*")) + def __init__(self, pattern, file_pattern): + super().__init__(_Video(f"{pattern}_{file_pattern}")) - def __init__(self, pattern): - super().__init__(_Encoder(f"{pattern}_90_*")) + def __init__(self, pattern, file_pattern): + super().__init__(_Encoder(f"{pattern}_{file_pattern}"))

github-actions · 2024-10-10T23:35:36Z

aeon/schema/ingestion_schemas.py

+)
+
+
+__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"]


CodeRabbit

The __all__ variable is used to specify the public interface of a module. However, it's not clear from this code whether all the items listed in __all__ are intended to be part of the public interface. Please ensure that only those items that should be accessible when importing the module using from module import * are included in __all__.

-__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"] +__all__ = ["intended_public_interface_item1", "intended_public_interface_item2", ...]

codecov · 2024-10-10T23:38:05Z

Codecov Report

Attention: Patch coverage is 87.80488% with 5 lines in your changes missing coverage. Please review.

Project coverage is 19.41%. Comparing base (83cd905) to head (daf6224).

Files with missing lines	Patch %	Lines
aeon/schema/ingestion_schemas.py	87.50%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
+ Coverage   18.77%   19.41%   +0.63%     
==========================================
  Files          37       38       +1     
  Lines        3009     3039      +30     
==========================================
+ Hits          565      590      +25     
- Misses       2444     2449       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jkbhagatio · 2024-11-06T15:25:16Z

Closing this since this was done in #438

new readers and schemas for reduced data storage in db

b54e1c3

jkbhagatio requested review from glopesdev and ttngu207 October 10, 2024 21:08

github-actions bot reviewed Oct 10, 2024

View reviewed changes

aeon/io/reader.py Show resolved Hide resolved

aeon/schema/ingestion_schemas.py Outdated Show resolved Hide resolved

aeon/schema/ingestion_schemas.py Show resolved Hide resolved

jkbhagatio linked an issue Oct 10, 2024 that may be closed by this pull request

Database storage concerns #415

Closed

1 task

updated tests

d6cf52f

github-actions bot reviewed Oct 10, 2024

View reviewed changes

tests/io/test_api.py Show resolved Hide resolved

tests/io/test_api.py Show resolved Hide resolved

cleaned up linting for ruff

f12e359

github-actions bot reviewed Oct 10, 2024

View reviewed changes

updated pandas and changed S to s lmao

daf6224

github-actions bot reviewed Oct 10, 2024

View reviewed changes

ttngu207 mentioned this pull request Oct 22, 2024

New ingestion_schemas + MANY minor fixes and improvements #438

Merged

jkbhagatio closed this Nov 6, 2024

jkbhagatio deleted the ingestion_readers_schemas branch November 14, 2024 11:41

jkbhagatio mentioned this pull request Nov 14, 2024

Database storage concerns #415

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new readers and schemas for reduced data storage in db #437

new readers and schemas for reduced data storage in db #437

jkbhagatio commented Oct 10, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Oct 10, 2024 •

edited

Loading

CodeRabbit Pro

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

github-actions bot left a comment

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

github-actions bot Oct 10, 2024

codecov bot commented Oct 10, 2024

jkbhagatio commented Nov 6, 2024

		from aeon.schema.ingestion_schemas import social03
		from aeon.schema.schemas import exp02

		nonmonotonic_path, exp02.Patch2.Encoder, start=pd.Timestamp("2022-06-06T13:00:49")
		)

		)


		__all__ = ["octagon01", "exp01", "exp02", "social01", "social02", "social03"]

new readers and schemas for reduced data storage in db #437

new readers and schemas for reduced data storage in db #437

Conversation

jkbhagatio commented Oct 10, 2024 • edited by github-actions bot Loading

Summary by CodeRabbit

github-actions bot commented Oct 10, 2024 • edited Loading

Walkthrough

Changes

CodeRabbit Pro

github-actions bot left a comment

Choose a reason for hiding this comment

Chat with CodeRabbit Bot (@coderabbitai)

Code suggestions

Pausing incremental reviews

github-actions bot left a comment

Choose a reason for hiding this comment

Chat with CodeRabbit Bot (@coderabbitai)

Code suggestions

Pausing incremental reviews

github-actions bot left a comment

Choose a reason for hiding this comment

Chat with CodeRabbit Bot (@coderabbitai)

Code suggestions

Pausing incremental reviews

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

Chat with CodeRabbit Bot (@coderabbitai)

Code suggestions

Pausing incremental reviews

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

codecov bot commented Oct 10, 2024

Codecov Report

jkbhagatio commented Nov 6, 2024

jkbhagatio commented Oct 10, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Oct 10, 2024 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

Chat with CodeRabbit Bot (`@coderabbitai`)

Chat with CodeRabbit Bot (`@coderabbitai`)

Chat with CodeRabbit Bot (`@coderabbitai`)