-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ingestion_schemas
+ MANY minor fixes and improvements
#438
New ingestion_schemas
+ MANY minor fixes and improvements
#438
Conversation
…Centre/gl-issue-418 Allow reading pose model metadata from local folder
…o datajoint_pipeline
why? corrupted data in harp files? not sure
ingestion_schemas
+ MANY minor fixes and improvements
aeon/dj_pipeline/__init__.py
Outdated
Args: | ||
query (datajoint.Query): A query object containing data from a Stream table | ||
drop_pk (bool, optional): Drop primary key columns. Defaults to True. | ||
round_microseconds (bool, optional): Round timestamps to microseconds. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatch is in the description of round_microseconds
which says "Defaults to False"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# Patch data - TriggerPellet, DepletionState, Encoder (distancetravelled) | ||
# For wheel data, downsample to 10Hz | ||
final_encoder_fs = 10 | ||
final_encoder_hz = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually want this to be 50 hz, not 10 hz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Encoder data is ingested at 50Hz, this is at the BlockAnalysis step, do we also want 50Hz here (or further downsample to 10?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we want 50 hz here too, thanks!
if encoder_df.empty: | ||
encoder_df["distance_travelled"] = 0 | ||
else: | ||
encoder_df["distance_travelled"] = -1 * analysis_utils.distancetravelled(encoder_df.angle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment saying something like -1 is for placement of magnetic encoder, where wheel movement actually decreases encoder value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
patch_rate = depletion_state_df.rate.iloc[0] | ||
patch_offset = depletion_state_df.offset.iloc[0] | ||
# handles patch rate value being INF | ||
patch_rate = 999999999 if np.isinf(patch_rate) else patch_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually an issue if patch rate is inf? Does it cause some downstream issue? We do this as default when no env is loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's due to MySQL float
doesn't handle INF
well - we can convert to NaN
but that would lose the intended meaning of INF
here
@@ -288,27 +299,50 @@ def make(self, key): | |||
& f'chunk_start <= "{chunk_keys[-1]["chunk_start"]}"' | |||
)[:block_start] | |||
subject_visits_df = subject_visits_df[subject_visits_df.region == "Environment"] | |||
subject_visits_df = subject_visits_df[~subject_visits_df.id.str.contains("Test", case=False)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes we use other, non "Test" subjects as test subjects. Maybe the check should be, if the subject does not begin with 'baa' (can str.lower to check for regardless of case) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is startswith("baa")
reliable, future-proof? Perhaps better logic is to also cross check with the Subjects to be manually specified by users for a particular experiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your suggestion would be a better check.
) | ||
pos_df = fetch_stream(pos_query)[block_start:block_end] | ||
pos_df["likelihood"] = np.nan | ||
# keep only rows with area between 0 and 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because areas of > 1000 is likely an experimenter, or some other artifact? Maybe specify that in the comment?
data = pd.DataFrame(payload, index=seconds, columns=self.columns) | ||
|
||
# remove rows where the index is zero (why? corrupted data in harp files?) | ||
data = data[data.index != 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I guess will be fixed in the next PR targeting a new Encoder reader in ingestion_schemas.py ?
@@ -48,6 +48,12 @@ def __init__(self, path): | |||
super().__init__(_reader.Pose(f"{path}_test-node1*")) | |||
|
|||
|
|||
class Pose03(Stream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made add a comment that this is necessary due to changing registers for the pose streams for social02 in particular? And that 03 corresponds to the fact that this is because this pattern is what we're going with for social03 and moving forward? Or call this class something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just import Pose
from social_03.py as Pose03
? I think I like something like this actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is primarily due to full-pose data for social0.2 has a different pattern than that of the non-fullpose SLEAP data.
I don't think you can do from social_03 import Pose as Pose03
, the way the the Devices/Streams being instantiated is that it's using the class name, so aliasing wouldn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, then just some documentation is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a bunch of small comments here.
A general point - maybe add top-level module docstrings for all these modules? Unless this will be done by @MilagrosMarin in #443 ?
Yes, this will be address when we apply Ruff checks and fixes |
This PR includes a few major new logic/features and many minor fixes (mostly for BlockAnalysis)
ingestion_schemas
- i.e. a separate set of schemas for DJ ingestion (specializedEncoder
reader andVideo
reader)hw_timestamp
columnstreams.py
DJ schema accordingly (mainly to drop extra columns when reading Video data)social0.2
BlobPosition
to read and store blob position tracking when SLEAP data is not availablesubject_in_patch
timesfetch_stream
function rounds times to microseconds (mysql precision is to microseconds only)BlobPosition
whenSLEAPTracking
is not availableBlockDetection
- when double 0s are found, use the first 0s (instead of the 2nd one)BlockForaging
computed tableFix #427
Replace #437