Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior with event reader for neuralynx files #1597

Closed
eduardosand opened this issue Nov 11, 2024 · 15 comments
Closed

Unexpected behavior with event reader for neuralynx files #1597

eduardosand opened this issue Nov 11, 2024 · 15 comments
Labels

Comments

@eduardosand
Copy link

eduardosand commented Nov 11, 2024

Describe the bug
When loading in a neuralynx file that had several starts and stops in recording (therefore missing data), the events in Events.nev and global start are relative to the most recent starting recording as opposed to the first sample of the file. Then when using event_reader.rescale_event_timestamp(timestamps), inappropriate timestamps are returned as a result.

To Reproduce

event_reader = NeuralynxRawIO(dirname=stems[0], include_filenames=event_path)
event_reader.parse_header()
ph_reader = NeuralynxRawIO(dirname=stems[0], include_filenames=ph_path)
ph_reader.parse_header()
global_start = ph_reader.global_t_start
global_start_ev = event_reader.global_t_start
print(global_start == global_start_ev) #Returns False if the recording itself had many stops and missing data
event_timestamps, _, event_labels = event_reader.get_event_timestamps()
event_times = event_reader.rescale_event_timestamp(event_timestamps)
event_times_manual = event_timestamps*10**-6 - global_start
print(event_times==event_times_manual) # prints False if the recording has many stops and it's not reflected as an event

Expected behaviour
I expect the rescale event timestamp method on event reader to return the timestamp in seconds relative to the first sample of the entire fire. Instead, the timestamps are returned relative to the most recent 'starting recording' event which doesn't always pop up in the event_reader events, despite in showing up when I open the file in other contexts. Essentially, the global_start for events.nev doesn't match the global start for photo.ncs or other .ncs files that were recorded simultaneously.

Environment:

  • OS: Windows, and tested on Linux
  • Python version: 3.11.3
  • Neo version: 0.13.4

Additional context
I'm only writing this so that others who are analyzing neuralynx data are aware of this potential issue. For now, I recommend avoiding the use of event_reader.rescale_event_timestamp(event_timestamps) and loading in a recording from the dataset to get global_start if needed. It'll take slightly longer but worth it. After getting global_start from a file, you can manually convert the event_timestamps with

rescaled_event_timestamps = event_timestamps*10**-6 - global_start

@zm711
Copy link
Contributor

zm711 commented Nov 11, 2024

Thanks @eduardosand. This has been a big struggle with neuralynx (and other ios) that have pauses and skips. Maybe @samuelgarcia or @PeterNSteinmetz have opinions on this since they know this IO much more than I do.

@PeterNSteinmetz
Copy link
Contributor

@eduardosand Can you try it using the keep_original_times=True argument in the constructor?

If that doesn't work, it would be useful to have a set of test files to see exactly what is happening.

I notice that the two constructor calls appear to have different include_filenames. Without the keep_original_times, the first time in the file(s) provided is subtracted from all times being returned.

@eduardosand
Copy link
Author

eduardosand commented Nov 12, 2024

Yes, so that's what I mean. From the same dataset, but from different files (events.nev and photo.ncs(or any other file)), the global_t_start are different. The event labels then, rescaled or not, are in this case about 9000 seconds off, I think because of the start and stop. The global_t_start are correct in the other signals in this dataset to the photo.ncs, but not events.nev. Event reader doesn't show starting and stopping recording, but I saw it when looking at the events.nev file in a text editor.

I know the event label times are inappropriate because I'm using ephy_viewer to comb through my photodiode signal to separate out different tasks in the same file and noticed the event labels were offset by 9000, but relative to each other are correct (event 1 and event 2 are roughly 1000 seconds apart, the labels were correct when recording).

Here are two files that should demonstrate the mismatch, but I found this in a few more cases.
https://drive.google.com/drive/folders/17JAKAyIKZUhpi08s0Ztg_L-HM4nGvVeA?usp=sharing

With keep_original_times, the first time is 0, but the events are still in machine time. Then when rescaling the event times, they don't change. I need them in seconds from start of recording, so unfortunately, this doesn't work.

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Nov 12, 2024

@eduardosand Those files are in a protected DropBox. I just requested access so you should see that.

Please also note that the timestamps between two instances of NeuralynxRawIO will not agree unless both use the keep_original_timestamps=True option. For this RawIO, timestamps are in microseconds until the rescale_ method is called, then they should be in seconds and in floating point format.

When keep_original_timestamps=True, the global_t_start should always be 0.

In any case, once I have access to the test files, I will have a further look.

@eduardosand
Copy link
Author

eduardosand commented Nov 12, 2024

@PeterNSteinmetz So I noticed that in general the global_t_start agrees across NeuralynxRawIO if they're streams (.ncs files), but occasionally the global_t_start of a .nev file is the one that disagrees(hence the issue here), when using keep_original_times=False.

Might be a bug, but after using keep_original_timestamps=True, and running

event_timestamps, _, event_labels = event_reader.get_event_timestamps()
event_times = event_reader.rescale_event_timestamp(event_timestamps)
print(event_timestamps == event_times)

I found that this printed out true, suggesting rescale_event_timestamp didn't rescale my times. My understanding of the rescale_event_timestamp method, is it does the computation I suggested in the original post, namely
rescaled_event_timestamps = event_timestamps*10**-6 - global_start

But, since global_start would be 0 if keep_original_times=True, then rescaling doesn't change anything. As a result, the event_times fail to be meaningful.

I granted you access to the files.

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Nov 13, 2024

@eduardosand Please see the code gist at https://gist.github.com/PeterNSteinmetz/0472b4a7fd75749922a5c76d2b0d4c6f. It should run without a failed assertion for your test files.

I believe the trouble you are experiencing may be related to not having specified a event_channel_index in the get_event_timestamps method call which then defaults to 0. If this does not resolve your issue, can you please add some assertions to the code snippet which are failing and which illustrate the problem you are seeing?

This has unearthed another bug, which I think is unrelated to you issue. Thus can we have your permission to use this .nev file in the testing data repository? Is it human subjects data by any chance? If so, do we need to anonymize it in some way.

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Nov 13, 2024

@eduardosand On further thought, I wonder if the apparent discrepancy may be due to the use of ephyviewer when looking at the signal in photo1_0012.ncs. I believe it may ignore the gaps and breaks in recording and present all samples as one contiguous time block, even when that is not the case.

There are 4942 segments in this file and apparently some very large gaps. Also see the strict_gap_mode setting which is True by default. Though I think even with that set false you will still have more than one segment with large gaps and that may cause this apparent discrepancy.

@eduardosand
Copy link
Author

eduardosand commented Nov 13, 2024

@PeterNSteinmetz Thanks for looking into this. I actually process the data into a numpy array ahead of using ephyviewer. I deal with gaps by interpolating across them (in between segments I compute the difference between the timestamp of the last sample of the first segment and the first sample of the first segment). That's actually part of the reason I noticed this discrepancy. I have it print out when there are large sections without samples and noticed that if I took that time to be 0, then the events labels did have the correct time, suggesting that that time was being mistaken as global_t_start. I understand that in general the global_t_starts won't match (in cases where this issue doesn't pop up it seems like the difference is less than a millisecond) but here the difference is on the order of 9000 seconds.

@PeterNSteinmetz
Copy link
Contributor

@eduardosand That seems like a good way to handle the issue of multiple segments. If you try the strict_gap_mode=False flag it might reduce the number of segments you are dealing with.

So what you are noticing may be related to what I noted in Issue #1598 . Could you add some assertions to the gist I linked above that show exactly what you are concerned about? That will help me determine if this is the same issue and then try and resolve it.

I understand you have a workaround presently, is that correct?

@eduardosand
Copy link
Author

eduardosand commented Nov 19, 2024

@eduardosand Please see the code gist at https://gist.github.com/PeterNSteinmetz/0472b4a7fd75749922a5c76d2b0d4c6f. It should run without a failed assertion for your test files.

I believe the trouble you are experiencing may be related to not having specified a event_channel_index in the get_event_timestamps method call which then defaults to 0. If this does not resolve your issue, can you please add some assertions to the code snippet which are failing and which illustrate the problem you are seeing?

@PeterNSteinmetz
I didn't catch this until now. I just ran the code gist. Runs so that's good. I still think there's an additional bug on top of the one you pointed out.

event_timestamps, _, event_labels = nevRawIO.get_event_timestamps()
# value in microseconds from binary inspection of file
print(event_timestamps)
print(event_labels)
# assert(event_timestamps[0]==1597946624394866)
event_timestamps_rescaled = nevRawIO.rescale_event_timestamp(event_timestamps)
assert(event_timestamps_rescaled[0]==1597946624.394866) # in seconds

Note that this last line seems to imply the first event was 50 YEARS after the first sample. My guess is that when nevRawIO.get_event_timestamps() runs with keep_original_times=True, the global_t_start isn't subtracted from all the event times, leading to this issue.

Additionally,

assert(str0tStart==1597936900.445226)
# 1597946282.718943 this is the global_tstart of the .nev file

It's possible I'm mistaken but in my hands the global_tstart of the .nev file is a whole 9382 seconds after the global_tstart of the .ncs file.

Not sure how to edit your gist. Here's some extra code that highlights the issue. In short, I think that the neuralynx machine was unable to record a 'Starting Recording' event at the start of the file stream but managed to get the 'Stopping Recording'. The 'Stopping Recording' event is being used as t=0 for Events_0012.nev because I guess it's assumed that the first event is t=0. Not necessarily a bug, but maybe there could be a warning that the first event label isn't the 'Starting Recording' label so global_t_start is likely to be inaccurate?

nevRawIO_orig = NeuralynxRawIO(dirname=".", include_filenames=['Events_0012.nev'], keep_original_times=True)
nevRawIO = NeuralynxRawIO(dirname=".", include_filenames=['Events_0012.nev'], keep_original_times=False)
nevRawIO_orig.parse_header()
nevRawIO.parse_header()

raw_event_timestamps, _, raw_event_labels = nevRawIO.get_event_timestamps()
assert(raw_event_labels[0]=='WCST3')
raw_event_timestamps_rescaled = nevRawIO.rescale_event_timestamp(raw_event_timestamps)
missing_event_timestamps, _, missing_event_labels = nevRawIO.get_event_timestamps(0, 0, 1)
assert(missing_event_labels[0]=='Stopping Recording')
missing_event_timestamps_rescaled = nevRawIO.rescale_event_timestamp(missing_event_timestamps)

# This passes but given that first label is 'stopping recording' and not actually at start of recording, maybe rescaling should at least throw a warning
assert(round(raw_event_timestamps[0]*10**-6-missing_event_timestamps[0]*10**-6,5)==round(raw_event_timestamps_rescaled[0],5))

# getting correct times
ncsRawIO = NeuralynxRawIO(dirname='.', include_filenames=['photo1_0012.ncs'], keep_original_times=False)
ncsRawIO.parse_header()
# these event labels make more sense given the signal
assert(round(raw_event_timestamps[0]*10**-6-ncsRawIO.global_t_start,5)==9723.94964)

This has unearthed another bug, which I think is unrelated to you issue. Thus can we have your permission to use this .nev file in the testing data repository? Is it human subjects data by any chance? If so, do we need to anonymize it in some way.

The .nev file and .ncs are from a session with a patient but I didn't share any files that have neural signals present (photodiode file is just the triggers). Am I right to assume that the name in the gist you sent, is something you found in the data? If so, that is an issue, as the data does need to be anonymized. I had assumed this data didn't have Personally Identifiable Information (PII), but if it does, I'll work to scrub it before giving you permission to use the files.

I understand you have a workaround presently, is that correct?

My work around is the following. Global_t_start is correctly recorded from other files, so as long as there is both an .ncs file and .nev file, the expected behavior that i want can be attained in the following way (just rescaling the event labels manually). Note here the use of different readers(one to get the events and another to get global_start). This is done so that I can rescale the event labels appropriately and they make sense in the actual data stream.

event_reader = NeuralynxRawIO(dirname=stems[0], include_filenames=event_path, keep_original_times=False)
event_reader.parse_header()
ph_reader = NeuralynxRawIO(dirname=stems[0], include_filenames=ph_path, keep_original_times=False)
ph_reader.parse_header()
global_start_ph = ph_reader.global_t_start
event_timestamps, _, event_labels = event_reader.get_event_timestamps()
rescaled_event_timestamps = event_timestamps*10**-6 - global_start_ph

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Nov 20, 2024

@eduardosand Thanks for that code. I have had a look at this.

Here is one issue that causes the mismatch which you are testing with the line

assert(round(raw_event_timestamps[0]*10**-6-ncsRawIO.global_t_start,5)==9723.94964)

The raw_event_timestamps are coming from a call on nevRawIO which is created with keep_original_times=False. Therefore those timestamps are relative to the first item in that .nev file. The ncsRawIO.global_t_start is from a RawIO which is created with keep_original_times=False also (the default) and so the times returned there are relative to the first block of samples in the .ncs file. Since these are separate rawIOs, there is no necessary guarantee that the relative timestamps are based on a common starting time. Each RawIO has its own.

If you were to place these in a directory and just give a dirname to the NeuralynxRawIO constructor, then all the timestamps returned should be relative to a common starting timebase.

Since there is no requirement for particular record types to be included in a .Nev file, I am not in favor of starting to add checks for specific record types in a general class like NeuralyxRawIO. Clearly this can be checked for separately or even implemented in a subclass if desired, but this is supposed to be a general RawIO in the library.

I think this explains the issue you raised here, does it not? If so, I will close this issue.

Of course there is still issue #1598, which I think made this more confusing, but I don't think fixing that will change what you observe here.

@eduardosand
Copy link
Author

eduardosand commented Nov 26, 2024

If you were to place these in a directory and just give a dirname to the NeuralynxRawIO constructor, then all the timestamps returned should be relative to a common starting timebase.

I tried this, and then I'm unable to return any event_timestamps or labels. Is this another bug or expected behavior if events_file global_t_start doesn't match with photodiodes global_t_start?

test_dir = "../test_case/"
neoRawIO_fulldir = NeuralynxRawIO(dirname=test_dir)
neoRawIO_fulldir.parse_header()
print(neoRawIO_fulldir.global_t_start)
full_dir_timestamps, _, full_dir_labels = neoRawIO_fulldir.get_event_timestamps()

print(full_dir_labels)
full_event_timestamps_rescaled = neoRawIO_fulldir.rescale_event_timestamp(full_dir_timestamps)
# nev file
nevRawIO_orig = NeuralynxRawIO(dirname=".", include_filenames=['Events_0012.nev'], keep_original_times=True)
nevRawIO_orig.parse_header()
# print(nevRawIO.global_t_stop-nevRawIO.global_t_start)

raw_event_timestamps, _, raw_event_labels = nevRawIO.get_event_timestamps()
print(raw_event_labels)

I think the check is not ideal, but it'd be nice to have event labels with timestamps that are accurate with respect to .ncs files.

@PeterNSteinmetz
Copy link
Contributor

I am starting to work on fixing this in general and addressing Issue #1598. However, g-node.org/NeuralEnsemble/ephy_testing_data is presently down which presents tests from running. Hopefully it will be back up soon.

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Nov 27, 2024

@eduardosand Please see the new test_keep_original_times at PR# 1600 . File test_neuralynxrawio lines 180-238.

This demonstrates how keep_original_times and the original timestamps and rescaling are supposed to work, per BaseRawIO. This uses the extant test data which includes .ncs, .nse, and .nev files all in one directory. When you do that, there is one global_t_start that properly references them all, whether keep_original_times is set to True or False (the default).

The documentation of this operation beyond the comments on the BaseRawIO methods could probably use some expansion.

I think this closes this issue, but if not, please feel free to let me know what other question you may have.

@PeterNSteinmetz
Copy link
Contributor

@zm711 Zach, I think this can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants