-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconversion of phase1 data from raw into bids #29
Comments
I think it depends on what's most important. I can see the argument for the first approach. However, I'd argue that there is an additional Pro for the second approach: We'd have a raw dataset + specifications, that we can much more easily apply any different conversions in the future on. Thinking of significant changes in BIDS or yet another standard we'd want to represent the data in. It may also be easier to get new metadata standards/formats etc. in the future. So, not exactly sure. Need to have a closer look into the existing repo to see what we may loose that way. One more thing: If we actually can have time traveling containers that can reproduce what was done back then, it seems to me that we can have a third approach: a merger of both. Nothing forces us to use hirni with the current toolbox. However, if we go for 1) or 3), I'll need help figuring out what was done how and therefore how to build the container(s) and may be break things down into a few procedures. Inspecting that on my own sounds like it'll take too long. |
I'd say we go for (2) hirni in this case. |
@bpoldrack here is the BIDS spec for the diffusion data https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#diffusion-imaging-data Susceptibility Weighted Imaging (SWI) is still a BEP (https://docs.google.com/document/d/1kyw9mGgacNqeMbp4xZet3RnDhcMmf4_BmRgKaOkO2Sc/edit) |
Referencing #34 (comment) |
Note: Instead of Comparison of conversion outcome is to be made against |
Starting to look into conversion issues. Task labels (see #35 (comment)):
What labels do we settle for, @mih ? Edit: |
|
Here is a potential structure for the events files of the pandora data. They should be derived from the
The corresponding json file should look about like this: {
"trial_type": {
"LongName": "Event category",
"Description": "Indicator of the genre of the musical stimulus",
"Levels": {
"country": "Country music",
"symphonic": "Symphonic music",
"metal": "metal music",
"ambient": "ambient music",
"rocknroll": "rocknroll music"
}
},
"sound_soa": {
"LongName": "Sound onset asynchrony",
"Description": "asynchrony between MRI trigger and sound onset",
},
"catch": {
"LongName": "Control question",
"Description": "flag whether a control question with presented",
},
"volume": {
"LongName": "fMRI volume total",
"Description": "fMRI volume corresponding to stimulation start",
},
"run_volume": {
"LongName": "fMRI volume run",
"Description": "fMRI volume corresponding to stimulation start in the current run",
},
"run": {
"LongName": "Run in Sequence",
"Description": "order of run in sequence ",
},
"run_id": {
"LongName": "Trial ID in Run",
"Description": "ID of trial sequence for this run",
},
"stim": {
"LongName": "Stimulation file",
"Description": "stimulus file name",
},
"delay": {
"LongName": "inter-stimulus interval",
"Description": "inter-stimulus interval in seconds",
"Units": "seconds"
},
"trigger_ts": {
"LongName": "Trigger time stamp",
"Description": "time stamp of the corresponding MRI trigger with respect to the start of the experiment in seconds ",
"Units": "seconds"
},
"genre": {
"LongName": "Genre",
"Description": "Indicator of the genre of the musical stimulus",
"Levels": {
"country": "Country music",
"symphonic": "Symphonic music",
"metal": "metal music",
"ambient": "ambient music",
"crocknroll": "rocknroll music"
}
}
} |
For anatomy I have the following image series. SeriesNumber, Protocol, currently assigned modality, whether currently converted or ignored for conversion: (101, 'SmartBrain_32channel', None, 'ignored'), Questions: Something that is ignored, but should be converted? Which ones should be assigned modality veno and angio? |
Suggested content for the
TODOs:
Note: This is different from the OpenNeuro |
Current image series for 7T_ad: (1, 'AAHScout_32ch', None, 'ignored'), (2, 'AAHScout_32ch_MPR', None, 'ignored'), |
And pandora, @mih : (2, 'AAHScout_32ch_MPR', None, 'ignored'), |
I found a trigger mismatch in the converted Physio files from the audiomovie in your test repo (/data/project/studyforrest_phase1/testing/scientific-data-2014-bids/sub-002/func), @bpoldrack. The test checks whether it finds the same number as triggers as the run had TRs: check_physio()
{
nvols=$1
shift
for f in $@; do
found_trigger="$(zgrep '^1' "$f" | wc -l)"
assertEquals "Need to find each trigger in the log" "$nvols" "$found_trigger"
done
}
test_physio_movie_runs()
{
count=1
for nvols in 451 441 438 488 462 439 542 338; do
check_physio $nvols *_task-forrestgump_run-0${count}_physio.tsv.gz
count=$(( $count + 1 ))
done
}
It fails for a few subjects in the conversion. subject 1
subject 2:
subject 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 are good! subject 18: (only fails with one trigger)
subject 19 and 20 are good. It does not fail for the OpenNeuro dataset. |
Reminder to add the scanner acquisition protocols to |
Doing the same assertion for the pandora data yields mismatches:
subject 18
|
Here is the BIDS convention for the
|
Re physio trigger mismatches: Noting observations for now. At least for audiomovie, the failing subjects are exactly the ones with sampling frequency 100, while the passing ones have 200. Didn't check pandora yet. |
Scripts for converting the log files into events.tsv and events.json files are in # for the trial file
./reconvert_behavtrials_pandora \
/data/project/studyforrest/pandora/logs/xy22.trials \
'sub-01_task-avmovie_run-0_events'
# for the log files
./reconvert_behavlog_pandora \
/data/project/studyforrest/pandora/logs/ap75.log \
'sub-01_task-avmovie_run-0_events' i.e. |
I have a README put together, but it requires updating the file paths. Once we have a sample single subject converted dataset, I can start updating the file paths. |
Update from matrix channel: First of all, it ran through, yeah! I have not yet looked closely into it. Initial glimpse, says there are (somewhat minor issues):
Those are all "technical" that should be relatively easy to fix and apply those fixes. Here it is: /data/project/studyforrest_phase1/phase1-bids. ping @mih |
It would be good, if this issue gets a checklist to capture what has been looked at and for which conversion attempt. Otherwise it will be rather hard to come to an end. |
Edited first post. |
Here are the issues I've gathered so far for the recent conversion. re Missing data types: These are files that were described by the old README that I haven't managed to find in the newly converted dataset. Some of them are likely elsewhere and don't belong in this dataset, but I wanted to list them just in case.
re BIDS compliance:
|
Thanks, @loj !
We decided to not include them. Still the case, @mih?
== dico
Thx, need to investigate. Not intended.
True. Simply forgot to convert the toplevel specs of the raw datasets ;-)
@mih : Does that stuff exist in any other place and/or shape other than
Same here, @mih - no idea, where that even comes from.
Yes, there's something wrong with the created name.
We decided to go with
Yes, but I have no clue.
Good catch. That's a hint, that I seem to have screwed with the versions of the raw dataset. That was an (already fixed) bug - explains the toplevel stimuli, too.
Will do.
Ah - need to fix the deface procedure then. |
Hard to say, this issue lumps together so many aspects. If these are the fieldmaps that were acquired together with the DWI data at 3T, then yes. They are invalid.
moco = Motion corrected; dico = distortion corrected So depending which specific data we are talking about that statement is true (moco is precondition for dico), or not (dico is optional).
As in "not intended to be there for now"?
demographics.csv is an original file that contains data only available on paper. The two other CSVs are outdated and these old versions are here: https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/structure/scenes.csv https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/speech/german_audio_description.csv
This is described in the data paper under technical validation. These can all be ignore for the raw dataset, because they are the outcome of a computational pipeline (i.e. derivatives). Everything labeled "alignment/volumes" in the list above is in https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms . The aggregate timeseries are in https://github.com/psychoinformatics-de/studyforrest-data-aggregate From my POV, they can stay there (hosted on GIN).
Either way is fine with me: One is an established non-standard, the other is the anticipation of a standard.
As mentioned elsewhere, the task descriptions are in I don't think anyone has looked up the IDs for these tasks on http://www.cognitiveatlas.org/ yet.
Thx, looks OK to me. |
There are two possible approach that I can see:
Make the code from 2013 run reproducibly
This should be doable, this was all standard Debian package and the custom code in the datasets. We could create a singularity image that travels back in time. This would be attractive from a perspective of forensic data management. And might get us to a place that matches the OpenNeuro #28 state, but with provenance.
Redo the conversion with modern day tooling
This comes with the danger that files come out differently. One would then need to figure out, how they differ, and maybe even why. Pro: this will give us much better metadata automatically; we can showcase hirni. Con: Lots of work, leading to possibly lots of more work.
Waddayathink @bpoldrack
Checklist for the BIDS dataset:
Version:
01fe519fb76c92dd323c4876b57554ce010928f0
The text was updated successfully, but these errors were encountered: