Establish studyforrest-data-raw #34

mih · 2021-04-26T09:07:07Z

Aiming to be a superdataset for targeted subdatasets for each "study". These studies were internally called

7T_ad
pandorra
anatomy
fg_eyegaze_raw
3T_av_et
3T_visloc

These names correspond to folders in the original datastructure on the cluster. They contain the pristine data artifacts and can never be made public, due to data protection regulations.

There are at least two more "raw" datasets (multires3T and multires7T), but their DICOM data are not readily accessible ATM.

The text was updated successfully, but these errors were encountered:

bpoldrack · 2021-04-26T09:56:01Z

I'm working on building this dataset with subdatasets 7T_ad, pandorra, anatomy for now.
Not entirely clear whether and how we want to reflect the notion of phase1. Three options:

within this dataset (just directory?), so it's clear by hierarchy that 7T_ad, pandorra, anatomy are its parts
at the level of converted, anonymized BIDS datasets only as a partial conversion of studyforrest-data-raw (BIDS dataset would then have those three subdatasets under sourcedata)
intermediate dataset that would then be converted and would be at the "same level" as studyforrest-data-raw, referencing a subset of its subdatasets

loj · 2021-04-26T10:57:19Z

I would lean towards option 2

at the level of converted, anonymized BIDS datasets only as a partial conversion of studyforrest-data-raw (BIDS dataset would then have those three subdatasets under sourcedata)

At this level, then, we could also maintain the data representation as described in papers in a separate branch. #5

bpoldrack · 2021-04-26T11:19:47Z

At this level, then, we could also maintain the data representation as described in papers in a separate branch. #5

True, but independent on how we reference the raw data at the level of a notion like phase1.

The "issue" with 2) would be dataset level files like README, dataset_description.json and so on. Current approach would be to have them in the raw dataset and use a "copy-converter" for the respective BIDS dataset. If we don't have a phase1-raw location (1 or 3), where would those things live? They could, of course, be created/added at the BIDS level only. Not sure whether there are things at the phase1 abstraction, where this wouldn't work (b/c anonymization or whatever), though.

Approach 1 would be a special case for phase1, since other, possibly overlapping superdatasets can't be addressed the same way. So, I lean towards 3) as the most flexible thing that seems likely to generalize as an approach for other subsamples of studyforrest-data-raw. WDYT, @mih ?

bpoldrack · 2021-04-26T14:55:55Z

Adapted the scripts/approach to build this.

First trial of building the (sub)datasets finished:
/data/project/studyforrest_phase1/pandora
/data/project/studyforrest_phase1/anatomy
/data/project/studyforrest_phase1/7T_ad

Initial setup of them was done by /data/project/studyforrest_phase1/build-forrest/studyforrest-data-raw-sh.
Actual data import + spec editing was done by their respective build script in each dataset's code/creation.

bpoldrack · 2021-04-27T08:19:06Z

The three datasets pandora, 7T_ad and anatomy require a verification of being what we want them to be. That is: They are supposed to capture all relevant raw data of those "studies" (independent on what should be converted in what context). This requires knowledge of what exactly that means. How do we approach this, @mih?

bpoldrack · 2021-04-27T08:21:23Z

Additionally, I have now created /data/project/studyforrest_phase1/scientific-data-2014-raw, that contains those three as subdatasets, since we wanted to aim for publications being the targets for converted datasets. Currently the first conversion run based on this dataset is running in /data/project/studyforrest_phase1/scientific-data-2014-bids.

Adjusting the specs and checking what may be missing from the converted dataset, will require some kind of target definition to compare to. Is this supposed to be the release_openfmri1 tag in anondata or is there something else to base the adjustments on, @mih?

bpoldrack · 2021-04-27T12:24:01Z

Re raw data capturing:

anatomy looks good as far as I can tell, except for two directories:
Under /data/project/studyforrest/anatomy/data two subjects have an orig folder in addition to raw/dicom. Content looks like a conversion result, but I'm not sure. Does this need to be captured, @mih ?
As for pandora:
/data/project/studyforrest/pandora shows logs, pmc.tar.gz and swaroop that aren't currently captured. What are those, @mih and are those things anyhow associated with certain acquisitions?
I have an old TODO note, claiming I need logs and logs/raw somehow. Not sure what to make of this distinction.
7T_ad:
- The data folder in /data/project/studyforrest/7T_ad has behav subdirectories. I guess, they need to be sucked in.
  Do they require some kind of conversion? Are they just copied into the converted dataset? If so, where?
  Old note on the issue, that I can't fully decode ATM:
  
  import behav data into first acq per subject
  from /data/project/studyforrest/7T_ad/ad_data/${sub}*
  => the same as behav/; Two files are copied to behav/ + two more per subject.
- Additionally there's ad_data. What about that?

mih · 2021-04-28T19:41:55Z

OK, I made a first push into this project. It contains the majority of the pieces that are needed to build studyforrest-data-raw or hirni or whatever the name will be -- in the artifact/ directory.

mih · 2021-05-06T08:52:37Z

@bpoldrack can you please post the link to the generated raw datasets?

bpoldrack · 2021-05-06T09:03:30Z

@mih

/data/project/studyforrest_phase1/pandora
/data/project/studyforrest_phase1/anatomy
/data/project/studyforrest_phase1/7T_ad

adswa added the data label Apr 26, 2021

bpoldrack mentioned this issue Apr 27, 2021

Reconversion of phase1 data from raw into bids #29

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establish studyforrest-data-raw #34

Establish studyforrest-data-raw #34

mih commented Apr 26, 2021 •

edited

Loading

bpoldrack commented Apr 26, 2021

loj commented Apr 26, 2021

bpoldrack commented Apr 26, 2021

bpoldrack commented Apr 26, 2021 •

edited

Loading

bpoldrack commented Apr 27, 2021

bpoldrack commented Apr 27, 2021 •

edited

Loading

bpoldrack commented Apr 27, 2021 •

edited

Loading

mih commented Apr 28, 2021

mih commented May 6, 2021

bpoldrack commented May 6, 2021

Establish studyforrest-data-raw #34

Establish studyforrest-data-raw #34

Comments

mih commented Apr 26, 2021 • edited Loading

bpoldrack commented Apr 26, 2021

loj commented Apr 26, 2021

bpoldrack commented Apr 26, 2021

bpoldrack commented Apr 26, 2021 • edited Loading

bpoldrack commented Apr 27, 2021

bpoldrack commented Apr 27, 2021 • edited Loading

bpoldrack commented Apr 27, 2021 • edited Loading

mih commented Apr 28, 2021

mih commented May 6, 2021

bpoldrack commented May 6, 2021

mih commented Apr 26, 2021 •

edited

Loading

bpoldrack commented Apr 26, 2021 •

edited

Loading

bpoldrack commented Apr 27, 2021 •

edited

Loading

bpoldrack commented Apr 27, 2021 •

edited

Loading