Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-run workflow synchronized with BIDS dataset #219

Closed
wants to merge 33 commits into from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented May 8, 2020

Closes #217. This enhancement adds several functions to do the following:

  1. Load relevant (currently just BOLD) scans from BIDS dataset and determine acquisition time, duration, and generalized name for each.
  2. Extract trigger onsets from BlueprintInput object.
  3. Attempt to match physio trigger onsets to scan onsets.
  4. Interpolate any missing physio trigger periods from available scans.
  5. Identify onsets and offsets for physio trigger periods for each frequency in the physio object.
  6. Slice the physio object into acquisition-specific BlueprintOutputs.
  7. Save BlueprintOutput objects to BIDS files.
  8. Generate a figure showing timing of scans and physio trigger periods before and after synchronization, for QC purposes.

Current Development/Testing Strategy

  1. Requires a BIDS dataset and an associated physio file.
  2. Do the following:
from phys2bids.conversion import workflow
from phys2bids.interfaces.acq import populate_physio_obj

bids_dir = '/path/to/bids/'
subject = '01'
session = '01'
physio_file = '/path/to/physio/file.acq'
physio = populate_physio_obj(physio_file, chtrig=3)
res = workflow(physio, bids_dir, sub=subject, ses=session)
  1. Review output filenames and BlueprintOutput objects.

To Do

  • In dicoms, SeriesTime, InstanceCreationTime, and ContentTime are very similar, but AcquisitionTime is generally ~20 seconds earlier than the rest. We need to determine which time is closest to the scanner's trigger pulse. AcquisitionTime is when the trigger pulse is sent, but heudiconv uses ContentTime.
  • Heudiconv-generated BIDS datasets retain the sub-second resolution for acquisition time in the sidecar json files, but the scans.tsv file only has second-level resolution. We need sub-second resolution, so we should raise an issue about retaining that resolution in the scans.tsv files to make it easier to extract accurate onsets.
  • There is some variability in the delay between any of the possible scan onset times and the physio trigger periods. I assume this is lag from whatever task is running, and the physio triggers are the ones that are mistimed.
  • Missing scans or physio triggers make it harder to find the correct time offset. We will need to loop through both scans (in case first scan's physio trigger is missing) and physio triggers (in case first physio trigger's scan was dropped during conversion), and somehow select the best match to determine the correct delay.
  • Use phys2bids classes and methods for combining and splitting physio data.
    • I have given up on combining discrete segments from physio data for this PR, and will assume that each physio object represents one continuous recording.
  • Merge onsets of multi-contrast runs in scans file, either using the acquisition times (which are very close) or the entities in the filenames.
  • Infer output filenames from scans file, accounting for multi-contrast entities like echo.
  • Conceptual review
  • Integrate figure into reports
  • Tests

Proposed Changes

  • Add a completely (hopefully) unsupervised method for splitting multi-run physio files using onset-time and duration information from a converted BIDS dataset.

@pvelasco
Copy link

  1. In dicoms, SeriesTime, InstanceCreationTime, and ContentTime are very similar, but AcquisitionTime is generally ~20 seconds earlier than the rest. We need to determine which time is closest to the scanner's trigger pulse.

From some documentation I found from Siemens on image reconstruction:

AcquisitionTime: The Time the acquisition of data started.

From an answer given by Chris Roden (dcm2niix):

Instance Creation Time 0008,0013 initially reflects the time an image leaves a parallel processing reconstructor, so it has a variable delay with regards to the acquisition time. Acquisition Time is more meaningful for data.

So, if the reconstruction involves something more than a straight FFT (i.e., GRAPPA, SENSE, etc.), that will explain the ~20 sec delay you find between the AcquisitionTime and the other times.

If I remember correctly, for Siemens scanners, the "scanner trigger" is sent at the time of the excitation pulse of the first slice of each (collected) volume. So the very first trigger will be sent just a few milliseconds before the first chunk of data for the first slice for the first volume is acquired (AcquisitionTime).

Regarding the time resolution (seconds vs. sub-second): I think the purpose of func_times.tsv and physio_times.tsv is to know which physio file (or chunk of a session-long one) corresponds to which functional file, isn't it?. For that, second-level resolution should be enough. Then, the fine alignment (the one requiring sub-second resolution) should be handled by looking at the first trigger signal in each physio file. That should be your "Time zero", according to BIDs.

@tsalo
Copy link
Member Author

tsalo commented May 11, 2020

Thanks @pvelasco! I'm glad to know that AcquisitionTime is the right one, although it looks like heudiconv grabs acq_time from ContentTime instead of AcquisitionTime. It might be best to use AcquisitionTime instead, although I can follow up about that in nipy/heudiconv#447.

Regarding the time resolution (seconds vs. sub-second): I think the purpose of func_times.tsv and physio_times.tsv is to know which physio file (or chunk of a session-long one) corresponds to which functional file, isn't it?. For that, second-level resolution should be enough. Then, the fine alignment (the one requiring sub-second resolution) should be handled by looking at the first trigger signal in each physio file. That should be your "Time zero", according to BIDs.

Yes, the scan times are used to find the trigger periods that correspond to each functional file, but this approach won't work for single-scan physio files, because it relies on the exact times between physio trigger periods as well as those between scan onsets.

At the moment, my code finds the trigger signal that would have the shortest delay from the scan acquisition time, rather than the first trigger signal. My logic is that, since the triggers occur at variable delays, the one with the shortest delay is the one that is closest to the actual start of acquisition, and is thus the one we should build our timing information from. To find that time, we need scan times that are as accurate as possible. Well, actually we need relative onsets to be accurate. The absolute scan times are not used, because I didn't want to assume that the physio and scan computers' clocks were perfectly synchronized, and because the BioPac segment timestamps seem to only have second-level resolution.

tsalo added 3 commits May 11, 2020 20:34
Will need to drop before general use, but there is a plan in place.
Just not sure how to implement it yet.
@tsalo
Copy link
Member Author

tsalo commented May 12, 2020

The functions are now in a state that I can call them sequentially in a notebook (see here). I have also made some time series figures of the physio and functional timing from the results (see here for the notebook and below for the figures).

You might notice a few odd things below:

  • Some of the blue (physio) blocks extend past the red (scan) blocks. This is because my task failed to reset the trigger when it ended, so the trigger is only reset right before the next task starts, by that next task.
  • Some blue blocks that don't have corresponding red blocks. That's where scans were dropped. In this case, the first task was stopped and restarted, as was the second one.
  • It's hard to see here but the last block has a slight mismatch between the red and blue blocks. Our last task actually sends the trigger about 6 seconds late, which is something I was not aware of until I started working on this code!

timeseries_unsynchronized
timeseries_synchronized

from operator import itemgetter
from itertools import groupby
physio_data = bioread.read_file(physio_file)
trigger_channel = physio_data.channels[-1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: would you consider making your code more generic by allowing more flexibility in the selection of the trigger_channel?
Something like:

trigger_idx = [c.name.lower() for c in physio_data.channels].index('trigger')
trigger_channel = physio_data.channels[trigger_idx]

or:

trigger_channel = [c for c in physio_data.channels if c.name.lower()=='trigger'][0]

(or maybe just 'trig' in c.name.lower(), in case people use abbreviations...)

Then, you could use the last channel as a fallback option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things on my to-do list is to use the native phys2bids classes. I don't know much about them at the moment, so that may take some time, but I believe that selecting the trigger channel is one of the possible parameters for the main physio container object.

That said, if your interest is more in making it easier to test/debug for the near future, I'm happy to make the trigger channel selection more flexible.

@smoia smoia added the Minormod This PR generally closes an `Enhancement` issue. It increments the minor version (0.+1.0) label Jun 16, 2020
@smoia smoia self-assigned this Jun 18, 2020
@RayStick
Copy link
Member

RayStick commented Jul 20, 2020

Hi all, I have read through what people have written but I have done so rather quickly (so as to catch up), so sorry if I misunderstand things.



In general, I agree with @smoia - I think I will need to see this code in action to understand it properly. I also lean towards have an “old triggers” column and a new “corrected” triggers column, as it probably helps for troubleshooting. 

In general, this PR looks like its adding some great functionality, but it may only be relevant for certain experimental and equipment set-ups? I will explain a bit about my experimental set-up, below, and you can let me know if you agree!

Rationale/Concept - I don’t think I can remember an occasion where I have had just a few missing triggers in my physiological recordings, however I have had occasions where I have not recorded any trigger information at all, alongside the physiological data (due to forgetting to plug the trigger connection in ... ) which is very annoying. From what I have picked up, I think I have quite a different set-up to you @tsalo. I don’t have any synchronization between task (stimuli presentation) and trigger recording alongside the physiological data. I just record all the volume triggers that the scanner sends during fMRI sequences, but this does not trigger anything else to happen. (Sometimes I get the the stimuli presentation to be triggered by the first volume trigger of a scan, but that is separate from the physiological recording equipment).

Implementation - I am struggling to work out how this would exactly apply for the type of data I collect, so I will explain what we do. We first turn on our physiological recording software a while before we start any scanning, as it takes time to attach all the peripheral devices to a person, and check the signals look reasonable. We then record continuously throughout the whole MRI session i.e. all physiological data from one MRI session is contained within one file. Within this recording of physiological data, we record all MRI volume triggers. Therefore the start time of the physiological recording file is not synched at all with any other start time (task, MRI scan). However, I understand that this code is going to look at the relative timings between MRI scans to work out the relative physiological timing blocks? That makes some sense to me, but does it therefore always assume you have the first trigger of an fMRI scan, to get the onset of this block correct? And then if there are any triggers missing in the middle it can fill them in just fine? Otherwise, I am struggling to work out how this works without having a trigger recorded at the onset of an fMRI scan.

@tsalo
Copy link
Member Author

tsalo commented Jul 20, 2020

In general, this PR looks like its adding some great functionality, but it may only be relevant for certain experimental and equipment set-ups?

The ability of the workflow to correct trigger time series is only relevant for some setups, and the workflow will most definitely not work with single-run setups, but the general ability of the workflow to identify and label physio runs associated with specific imaging runs is broadly useful. Basically, you shouldn't have to feed in any information beyond a path to the BIDS dataset and the subject/session to get your physio data.

I am struggling to work out how this would exactly apply for the type of data I collect

Your trigger time series is going to be accurate (unlike mine), but it's also completely unlabeled. You still need to use a heuristic or something to identify which block of physio data corresponds to which imaging file in the BIDS dataset. This should automate that.

Therefore the start time of the physiological recording file is not synched at all with any other start time (task, MRI scan). However, I understand that this code is going to look at the relative timings between MRI scans to work out the relative physiological timing blocks? That makes some sense to me, but does it therefore always assume you have the first trigger of an fMRI scan, to get the onset of this block correct? And then if there are any triggers missing in the middle it can fill them in just fine? Otherwise, I am struggling to work out how this works without having a trigger recorded at the onset of an fMRI scan.

You need starts of physio trigger periods to generally match up with starts of fMRI scans for this to work, but a lot of the extra stuff I put into the workflow is to make sure it will work in cases where the first scan trigger or imaging data are missing.

Think of it like this. You have two arrays of "onsets" with the same scale (seconds) but different start points. You know the durations and "names" for the different onsets for one of the arrays, but not the other. Here's some pseudocode to illustrate:

physio_onsets = [1.5, 7, 12.2, 22, 27]  # from physio file
scan_onsets = [0.8, 6.3, 10.7, 20.5, 25.5]  # from BIDS dataset
scan_durations = [1, 2, 3, 4, 5]  # from BIDS dataset
scan_names = ['a', 'b', 'c', 'd', 'e']  # from BIDS dataset
# the times between the onsets at the same
delta_physio = np.diff(physio_onsets)  # [5.5., 5.2, 9.8, 5]
delta_scans = np.diff(scan_onsets)  # [5.5., 5.2, 9.8, 5]
delta_physio == delta_scans  # True
# So now we find what the scan onsets (and durations) would be
# in terms of the physio signal.

# I'll estimate the onset from just the first scan and physio block
# because we *know* neither is missing in this case
offset = scan_onsets[0] - physio_onsets[0]  # -0.7
new_scan_onsets = scan_onsets + offset
# now we loop through, take each new scan (onset : onset + duration)
# and name the file based on the scan name.

There were three things I noticed in my own data that complicated matters:

  1. Physio trigger onsets happens after a delay, at variable amounts. The delay is a natural consequence of using the task to send the signal, and is generally quite small (except for a bug in one of my tasks). The variability is also quite small and also a consequence of using an intermediary.
    • This does not impact your data, based on your setup.
    • It's nice to be able to "correct" the trigger signal in these cases, since the task-relayed trigger signal is inaccurate.
  2. Some physio trigger periods were missing, and thus the scans had no corresponding blocks to compare against. This happened sometimes when my tasks failed to send the signal to BioPac.
    • This does not impact your data, based on your setup.
    • It's nice to be able to "correct" the trigger signal in these cases as well, since the task-relayed trigger signal is, well, completely missing.
  3. Some scans were "missing". Sometimes, you stop a scan early or don't convert it over to the BIDS dataset because it's just that terrible. Whatever the reason, the BIDS dataset won't always reflect all of the physio trigger periods.
    • This impacts everyone's data, and is the main reason why we don't want to assume that the first physio trigger matches the first scan in the dataset.

To deal with these three things, my method is somewhat more complicated than the pseudocode above. Basically, I take the onset delta arrays for the physio and scan data and find the "offset" that minimizes the overall difference between the two arrays. In my data, that overall difference will be > 0. In your data (limited to cases where all scans were retained in the BIDS dataset), that difference should be 0.

Of course, the fewer scans there are and the more missing runs there are (both in physio and BIDS), the more likely the method is to be wrong, so that's why it outputs a figure as well.

@RayStick
Copy link
Member

Thanks for your detailed reply.
Before I reply properly, I have a quick question - how do you define the physio_onsets from the physio file?

@tsalo
Copy link
Member Author

tsalo commented Jul 21, 2020

I extract contiguous blocks of non-zero values in the trigger signal and identify the time of the first point in each block.

This step can be divorced from the main workflow code, though, in order to support your type of trigger signal. Per #219 (comment), one good route might be to binarize and "blockify" the trigger signal before the workflow, and retain that simplified trigger in a separate channel. Then the workflow could use the block-form trigger without touching the original trigger.

@RayStick
Copy link
Member

RayStick commented Jul 22, 2020

I extract contiguous blocks of non-zero values in the trigger signal and identify the time of the first point in each block.

This step can be divorced from the main workflow code, though, in order to support your type of trigger signal. Per #219 (comment), one good route might be to binarize and "blockify" the trigger signal before the workflow, and retain that simplified trigger in a separate channel. Then the workflow could use the block-form trigger without touching the original trigger.

From re-reading your last comment, and some of the in-line comments, I think I follow the context and scope of this PR now. Thanks!

@RayStick
Copy link
Member

RayStick commented Oct 8, 2020

Shall I wait for conflicts to be resolved before properly reviewing?

@tsalo
Copy link
Member Author

tsalo commented Oct 8, 2020

@RayStick I've resolved the conflicts so it's good to go. Thanks!

@smoia
Copy link
Member

smoia commented Oct 8, 2020

@tsalo I'm sorry but I really can't see if there was something that I missed and I should have done for this PR. Is it "only" missing new reviews? If so, Maybe someone else could take the lead on this.

What is the final solution on the trigger channel? Are you rewriting it or adding another "corrected" or "binarised" trigger channel?

@tsalo
Copy link
Member Author

tsalo commented Oct 8, 2020

@smoia See this thread: https://github.com/physiopy/phys2bids/pull/219/files#r455898174. Basically, the method will only work on trigger signals that are already block-ish. I didn't implement a solution for that in this PR, but I did open #278.

@RayStick
Copy link
Member

RayStick commented Oct 9, 2020

I will take a look on Monday, though bear in mind I have not looked at the phys2bids code for a while now ... so I might be a bit slow to review this! If it needs to be merged sooner, I'd suggest someone else jump in as second reviewer, sorry!

@smoia
Copy link
Member

smoia commented Oct 10, 2020

Hey @RayStick ! Thank you, but I would wait a tad longer. This PR is not passing any test, and it depends on #278 .
In fact, I'd revert it to draft @tsalo .
There are other reviewes needed if you're eager to do one ;)

Docstring rules are way too aggressive.

Also, E203 should be ignored. It's wrong.
@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #219 (bc7698a) into master (f587559) will decrease coverage by 18.34%.
The diff coverage is 37.57%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #219       +/-   ##
===========================================
- Coverage   94.61%   76.26%   -18.35%     
===========================================
  Files           8       10        +2     
  Lines         836     1083      +247     
===========================================
+ Hits          791      826       +35     
- Misses         45      257      +212     
Impacted Files Coverage Δ
phys2bids/slice4phys.py 42.04% <5.66%> (-55.10%) ⬇️
phys2bids/conversion.py 13.38% <13.38%> (ø)
phys2bids/bids.py 72.02% <57.60%> (-26.04%) ⬇️
phys2bids/physio_obj.py 91.75% <92.59%> (-1.14%) ⬇️
phys2bids/phys2bids.py 89.28% <100.00%> (+1.21%) ⬆️
phys2bids/utils.py 97.50% <100.00%> (-0.94%) ⬇️
phys2bids/io.py
... and 3 more

@tsalo tsalo changed the title [ENH] Add multi-run workflow synchronized with BIDS dataset Add multi-run workflow synchronized with BIDS dataset Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Minormod This PR generally closes an `Enhancement` issue. It increments the minor version (0.+1.0)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Method to split multi-run physio files using BIDS dataset or dicom directory
5 participants