From efdd1cfc4a0c280258468d1ba6cac419a0529de0 Mon Sep 17 00:00:00 2001 From: Gregory Conan Date: Wed, 6 Nov 2019 10:14:48 -0800 Subject: [PATCH] Bug fixes and README update - abcd2bids.py does not need image03 anymore - Fixed bug of commas in ftq_notes column of abcd_fastqc01.txt causing problems - good_bad_series_parser.py does not user the Manufacturer column anymore - Fixed problem of not making fmap folder properly - Updated README to reflect recent changes --- README.md | 39 +---- abcd2bids.py | 262 ++++++++++++++++++------------- src/good_bad_series_parser.py | 22 +-- src/sefm_eval_and_json_editor.py | 22 ++- src/unpack_and_setup.sh | 6 +- 5 files changed, 191 insertions(+), 160 deletions(-) diff --git a/README.md b/README.md index 73a01ba..00db0ff 100755 --- a/README.md +++ b/README.md @@ -19,39 +19,16 @@ Clone this repository and save it somewhere on the Linux system you want to do A 1. Python [`pandas` package](https://pandas.pydata.org) 1. [AWS CLI (Amazon Web Services Command Line Interface) v19.0.0](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html) -## Spreadsheets (not included) +## Spreadsheet (not included) -To download images for ABCD you must have two spreadsheets downloaded to this repository's `spreadsheets` folder: - -1. `abcd_fastqc01.csv` -1. `image03.txt` - -Both spreadsheets can be downloaded from the [NIMH Data Archive (NDA)](https://nda.nih.gov/) with an ABCD Study Data Use Certification in place. `abcd_fastqc01.csv` contains operator QC information for each MRI series. If the image fails operator QC (a score of 0) the image will not be downloaded. `image03.txt` contains paths to the TGZ files on the NDA's Amazon AWS S3 buckets where the images can be downloaded from per series. - -### How to Download `image03.txt` - -1. Login to the [NIMH Data Archive](https://nda.nih.gov/) -1. Go to **Data Dictionary** under **Tools** -1. Select **ABCD Release 2.0 (or whatever release is out)** under **Source Dropdown Menu** -1. Click **Filter** -1. Click the box under **Select** for just **Image/image03** -1. Click **Add to Filter Cart** at the bottom left of the page. -1. Wait for your cart filter to update. -1. In the upper right hand corner in the **Filter Cart Menu** click **Package/Add to Study** - - Under **Collections** by **Permission Group** click **Deselect All** - - Collapse any other studies you have access to and re-select **Adolescent Brain Cognitive Development** -1. Click **Create Package** - - Name the package something like **Image03** - - Select Only **Include documentation** - - Click **Create Package** -1. Download and use the **Package Manager** to download your package +To download images for ABCD you must have the `abcd_fastqc01.csv` spreadsheet downloaded to this repository's `spreadsheets` folder. It can be downloaded from the [NIMH Data Archive (NDA)](https://nda.nih.gov/) with an ABCD Study Data Use Certification in place. `abcd_fastqc01.csv` contains operator QC information for each MRI series. If the image fails operator QC (a score of 0), then the image will not be downloaded. ### How to Download `abcd_fastqc01.csv` 1. Login to the [NIMH Data Archive](https://nda.nih.gov/). 1. From the homepage, click the button labeled `GET DATA` to go to `Featured Datasets`. 1. Under the `Data Dictionary` heading in the sidebar, click `Data Structures`. -1. Add both spreadsheets (`image03.txt` and `abcd_fastqc01.csv`) to the Filter Cart. +1. Add `abcd_fastqc01.csv` to the Filter Cart. 1. Enter the spreadsheet file name into the `Text Search` box to find `ABCD Fasttrack QC Instrument`, then click its checkbox to select it. 1. At the bottom of the page, click the `Add to Workspace` button. 1. At the top-right corner of the page under `logout` is a small icon. Click on it to open the `Selected Filters Workspace`. @@ -92,7 +69,7 @@ This wrapper will create a temporary folder (`temp/` by default) with hundreds o ### Optional Arguments -`--username` and `--password`: Include one of these to pass the user's NDA credentials from the command line into a `config.ini` file. This will create a new config file if one does not already exist, or overwrite the existing file. If only one of these flags is included, the user will be prompted for the other. They can be passed into the wrapper from the command line like so: `--username my_nda_username --password my_nda_password`. +`--username` and `--password`: Include one of these to pass the user's NDA credentials from the command line into a `config.ini` file. This will create a new config file if one does not already exist, or overwrite the existing file. If only one of these flags is included, the user will be prompted for the other. They can be passed into the wrapper from the command line like so: `--username --password `. `--config`: By default, the wrapper will look for a `config.ini` file in a hidden subdirectory of the user's home directory (`~/.abcd2bids/`). Use `--config` to enter a different (non-default) path to the config file, e.g. `--config ~/Documents/config.ini`. @@ -100,14 +77,14 @@ This wrapper will create a temporary folder (`temp/` by default) with hundreds o `--download`: By default, the wrapper will download the ABCD data to the `raw/` subdirectory of the cloned folder. If the user wants to download the ABCD data to a different directory, they can use the `--download` flag, e.g. `--download ~/abcd-dicom2bids/ABCD-Data-Download`. A folder will be created at the given path if one does not already exist. +`--qc`: Path to the Quality Control (QC) spreadsheet file downloaded from the NDA. By default, the wrapper will use the `abcd_fastqc01.txt` file in the `spreadsheets` directory. + `--remove`: By default, the wrapper will download the ABCD data to the `raw/` subdirectory of the cloned folder. If the user wants to delete the raw downloaded data for each subject after that subject's data is finished converting, the user can use the `--remove` flag without any additional parameters. `--output`: By default, the wrapper will place the finished/converted data into the `data/` subdirectory of the cloned folder. If the user wants to put the finished data anywhere else, they can do so using the optional `--output` flag followed by the path at which to create the directory, e.g. `--output ~/abcd-dicom2bids/Finished-Data`. A folder will be created at the given path if one does not already exist. `--start_at`: By default, this wrapper will run every step listed under "Explanation of Process" below. Use this flag to start at one step and skip all of the previous ones. To do so, enter the name of the step, e.g. `--start_at correct_jsons` to skip every step before JSON correction. -`--image03` By default, this wrapper will use `./spreadsheets/image03.txt` as its neuroimaging data spreadsheet. To use a different spreadsheet, include this flag with a path to a readable file with neuroimaging data, e.g. `--image03 ./data.csv`. - For more information including the shorthand flags of each option, use the `--help` command: `python3 abcd2bids.py --help`. Here is the format for a call to the wrapper with more options added: @@ -178,7 +155,7 @@ The following files belong in the `data` subdirectory to run `abcd2bids.py`: Without these files, the output of `abcd2bids.py` will fail BIDS validation. They should be downloaded from the GitHub repo by cloning it. -This folder is where the output of `abcd2bids.py` will be placed by default. So, after running `abcd2bids.py`, this folder will have subdirectories for each subject session. Those subdirectories will be correctly formatted according to the [official BIDS specification standard v1.2.0](https://github.com/bids-standard/bids-specification/releases/tag/v1.2.0). +`data` is where the output of `abcd2bids.py` will be placed by default. So, after running `abcd2bids.py`, this folder will have subdirectories for each subject session. Those subdirectories will be correctly formatted according to the [official BIDS specification standard v1.2.0](https://github.com/bids-standard/bids-specification/releases/tag/v1.2.0). The resulting ABCD Study dataset here is made up of all the ABCD Study participants' imaging data that passed initial acquisition quality control (MRI QC). @@ -193,4 +170,4 @@ This wrapper relies on the following other projects: ## Meta -Documentation last updated by Greg Conan on 2019-10-17. \ No newline at end of file +Documentation last updated by Greg Conan on 2019-11-06. diff --git a/abcd2bids.py b/abcd2bids.py index aa9ab2c..1c8f900 100644 --- a/abcd2bids.py +++ b/abcd2bids.py @@ -4,16 +4,16 @@ ABCD to BIDS CLI Wrapper Greg Conan: conan@ohsu.edu Created 2019-05-29 -Last Updated 2019-10-18 +Last Updated 2019-11-06 """ ################################## # # Wrapper for ABCD DICOM to BIDS pipeline that can be run from the command line -# 1. Runs data_gatherer to create ABCD_good_and_bad_series_table.csv -# 2. Runs good_bad_series_parser to download ABCD data using that .csv table -# 3. Runs unpack_and_setup to unpack/setup the downloaded ABCD data -# 4. Runs correct_jsons to conform data to official BIDS standards +# 1. Imports data, QC's it, and exports ABCD_good_and_bad_series_table.csv +# 2. Runs good_bad_series_parser.py to download ABCD data using .csv table +# 3. Runs unpack_and_setup.sh to unpack/setup the downloaded ABCD data +# 4. Runs correct_jsons.py to conform data to official BIDS standards # 5. Runs BIDS validator on unpacked/setup data using Docker # ################################## @@ -23,9 +23,9 @@ from cryptography.fernet import Fernet from datetime import datetime from getpass import getpass +from glob import iglob import os import pandas as pd -from pathlib import Path import shutil import signal import subprocess @@ -35,20 +35,21 @@ STEP_NAMES = ["create_good_and_bad_series_table", "download_nda_data", "unpack_and_setup", "correct_jsons", "validate_bids"] +PWD = os.getcwd() + # Constants: Default paths to scripts to call from this wrapper, and default # paths to folders in which to manipulate data -CONFIG_FILEPATH = os.path.expanduser("~/.abcd2bids/config.ini") -CORRECT_JSONS = "./src/correct_jsons.py" -DATA_GATHERER = "./src/bin/run_data_gatherer.sh" -DOWNLOAD_FOLDER = "./raw/" -NDA_AWS_TOKEN_MAKER = "./src/nda_aws_token_maker.py" -SERIES_TABLE_PARSER = "./src/good_bad_series_parser.py" -SPREADSHEET_IMG3 = "./spreadsheets/image03.txt" -SPREADSHEET_MERGED = "./spreadsheets/ABCD_good_and_bad_series_table.csv" -SPREADSHEET_QC = "./spreadsheets/abcd_fastqc01.txt" -TEMP_FILES_DIR = "./temp" -UNPACK_AND_SETUP = "./src/unpack_and_setup.sh" -UNPACKED_FOLDER = "./data/" +CONFIG_FILE = os.path.join(os.path.expanduser("~"), ".abcd2bids", "config.ini") +CORRECT_JSONS = os.path.join(PWD, "src", "correct_jsons.py") +DOWNLOAD_FOLDER = os.path.join(PWD, "raw") +NDA_AWS_TOKEN_MAKER = os.path.join(PWD, "src", "nda_aws_token_maker.py") +SERIES_TABLE_PARSER = os.path.join(PWD, "src", "good_bad_series_parser.py") +SPREADSHEET_DOWNLOAD = os.path.join(PWD, "spreadsheets", + "ABCD_good_and_bad_series_table.csv") +SPREADSHEET_QC = os.path.join(PWD, "spreadsheets", "abcd_fastqc01.txt") +TEMP_FILES_DIR = os.path.join(PWD, "temp") +UNPACK_AND_SETUP = os.path.join(PWD, "src", "unpack_and_setup.sh") +UNPACKED_FOLDER = os.path.join(PWD, "data") def main(): @@ -57,7 +58,7 @@ def main(): it to meet BIDS standards, and validating that it meets those standards. :return: N/A """ - cli_args = cli() + cli_args = get_cli_args() def now(): return datetime.now().strftime("%H:%M:%S on %b %d, %Y") @@ -81,14 +82,14 @@ def now(): print("The {} step started at {}.".format(step, now())) globals()[step](cli_args) print("The {} step finished at {}.".format(step, now())) - print("ABCD to BIDS wrapper started at {} and finished at {}.".format( + print("\nABCD to BIDS wrapper started at {} and finished at {}.".format( started_at, now())) # Finally, delete temporary files and end script with success exit code cleanup(cli_args.temp, 0) -def cli(): +def get_cli_args(): """ Get and validate all args from command line using argparse. :return: Namespace containing all validated inputted command line arguments @@ -120,13 +121,12 @@ def cli(): parser.add_argument( "-c", "--config", - default=CONFIG_FILEPATH, - help=("Optional: Path to config file with NDA credentials. If no " + default=CONFIG_FILE, + help=("Path to config file with NDA credentials. If no " "config file exists at this path yet, then one will be created. " "Unless this option or --username and --password is added, the " "user will be prompted for their NDA username and password. " - "By default, the config file will be located at " - + os.path.abspath(CONFIG_FILEPATH)) + "By default, the config file will be located at " + CONFIG_FILE) ) # Optional: Get download folder path from user as CLI arg @@ -134,20 +134,10 @@ def cli(): "-d", "--download", default=DOWNLOAD_FOLDER, - help=("Optional: Path to folder which NDA data will be downloaded " + help=("Path to folder which NDA data will be downloaded " "into. By default, data will be downloaded into the {} folder. " "A folder will be created at the given path if one does not " - "already exist.".format(os.path.abspath(DOWNLOAD_FOLDER))) - ) - - # Optional: Get path to imaging data spreadsheet - parser.add_argument( - "-i", - "--image03", - default=SPREADSHEET_IMG3, - help=("Optional: Path to spreadsheet with neuroimaging data. If this " - "argument is excluded, then the default path will be " - + SPREADSHEET_IMG3) + "already exist.".format(DOWNLOAD_FOLDER)) ) # Optional: Get folder to unpack NDA data into from download folder @@ -155,29 +145,39 @@ def cli(): "-o", "--output", default=UNPACKED_FOLDER, - help=("Optional: Folder path into which NDA data will be unpacked and " + help=("Folder path into which NDA data will be unpacked and " "setup once downloaded. By default, this script will put the " "data into the {} folder. A folder will be created at the given " - "path if one does not already exist.".format(os.path.abspath( - UNPACKED_FOLDER))) + "path if one does not already exist.".format(UNPACKED_FOLDER)) ) parser.add_argument( "-p", "--password", type=str, - help=("Optional: NDA password. Adding this will create a new config " + help=("NDA password. Adding this will create a new config " "file or overwrite an old one. Unless this is added or a config " "file exists with the user's NDA credentials, the user will be " "prompted for them. If this is added and --username is not, " "then the user will be prompted for their NDA username.") ) + # Optional: Get QC spreadsheet + parser.add_argument( + "-q", + "--qc", + type=validate_readable_file, + default=SPREADSHEET_QC, + help=("Path to Quality Control (QC) spreadsheet file downloaded from " + "the NDA. By default, this script will use {} as the QC " + "spreadsheet.".format(SPREADSHEET_QC)) + ) + # Optional: During unpack_and_setup, remove unprocessed data parser.add_argument( "-r", "--remove", action="store_true", - help=("Optional: After each subject's data has finished conversion, " + help=("After each subject's data has finished conversion, " "removed that subject's unprocessed data.") ) @@ -188,7 +188,7 @@ def cli(): "--start_at", choices=STEP_NAMES, default=STEP_NAMES[0], - help=("Optional: Give the name of the step in the wrapper to start " + help=("Give the name of the step in the wrapper to start " "at, then run that step and every step after it. Here are the " "names of each step, in order from first to last: " + ", ".join(STEP_NAMES)) @@ -199,11 +199,11 @@ def cli(): "-t", "--temp", default=TEMP_FILES_DIR, - help=("Optional: Path to the directory to be created and filled with " + help=("Path to the directory to be created and filled with " "temporary files during unpacking and setup. By default, the " "folder will be created at {} and deleted once the script " "finishes. A folder will be created at the given path if one " - "doesn't already exist.".format(os.path.abspath(TEMP_FILES_DIR))) + "doesn't already exist.".format(TEMP_FILES_DIR)) ) # Optional: Get NDA username and password @@ -211,7 +211,7 @@ def cli(): "-u", "--username", type=str, - help=("Optional: NDA username. Adding this will create a new config " + help=("NDA username. Adding this will create a new config " "file or overwrite an old one. Unless this is added or a config " "file exists with the user's NDA credentials, the user will be " "prompted for them. If this is added and --password is not, " @@ -235,27 +235,24 @@ def validate_cli_args(args, parser): # Validate and create config file's parent directory try: - Path(os.path.dirname(args.config)).mkdir(parents=True, exist_ok=True) + os.makedirs(os.path.dirname(args.config), exist_ok=True) except (OSError, TypeError): parser.error("Could not create folder to contain config file.") # Validate other dirs: check if they exist; if not, try to create them; and # move important files in the default dir(s) to the new dir(s) + try: + for cli_arg in ("download", "output", "temp"): + setattr(args, cli_arg, os.path.abspath(getattr(args, cli_arg))) + except OSError: + parser.error("Failed to convert {} to absolute path.".format(cli_arg)) try_to_create_and_prep_directory_at(args.download, DOWNLOAD_FOLDER, parser) try_to_create_and_prep_directory_at(args.output, UNPACKED_FOLDER, parser) try_to_create_and_prep_directory_at(args.temp, TEMP_FILES_DIR, parser) - # Ensure that the folder paths are formatted correctly: download and output - # should have trailing slashes, but temp should not - if args.download[-1] != "/": - args.download += "/" - print(args.download) + # Ensure that the output folder path is formatted correctly: if args.output[-1] != "/": args.output += "/" - print(args.output) - if args.temp[-1] == "/": - args.temp = args.temp[:-1] - print(args.temp) return args @@ -268,10 +265,23 @@ def validate_dir_path(dir_path, parser): :param parser: argparse ArgumentParser to raise error if path is invalid :return: N/A """ - if not Path(dir_path).is_dir(): + if not os.path.isdir(dir_path): parser.error(dir_path + " is not an existing directory.") +def validate_readable_file(param): + """ + Throw exception unless parameter is a valid readable filename string. This + is used instead of argparse.FileType("r") because the latter leaves an open + file handle, which has caused problems. + :param param: Parameter to check if it represents a valid filename + :return: A valid filename as a string + """ + if not os.access(param, os.R_OK): + raise argparse.ArgumentTypeError("Could not read file at " + param) + return param + + def try_to_create_and_prep_directory_at(folder_path, default_path, parser): """ Validate file path of folder, and if it doesn't exist, create it. If a @@ -284,17 +294,17 @@ def try_to_create_and_prep_directory_at(folder_path, default_path, parser): :return: N/A """ try: - Path(folder_path).mkdir(exist_ok=True, parents=True) + os.makedirs(folder_path, exist_ok=True) except (OSError, TypeError): parser.error("Could not create folder at " + folder_path) # If user gave a different directory than the default, then copy the # required files into that directory and nothing else - default = Path(default_path) - if Path(folder_path).resolve() != default.resolve(): - for file in default.iterdir(): - if not file.is_dir(): - shutil.copy2(str(file), folder_path) + default = os.path.abspath(default_path) + if os.path.abspath(folder_path) != default: + for each_file in os.scandir(default): + if not each_file.is_dir(): + shutil.copy2(each_file.path, folder_path) def set_to_cleanup_on_crash(temp_dir): @@ -327,9 +337,9 @@ def cleanup(temp_dir, exit_code): :return: N/A """ # Delete all temp folder subdirectories, but not the README in temp folder - for temp_dir_subdir in Path(temp_dir).iterdir(): + for temp_dir_subdir in os.scandir(temp_dir): if temp_dir_subdir.is_dir(): - shutil.rmtree(str(temp_dir_subdir)) + shutil.rmtree(temp_dir_subdir.path) # Inform user that temporary files were deleted, then terminate wrapper print("\nTemporary files in {} deleted. ABCD to BIDS wrapper " @@ -348,7 +358,7 @@ def make_nda_token(args): """ # If config file with NDA credentials exists, then get credentials from it, # unless user entered other credentials to make a new config file - if not args.username and not args.password and Path(args.config).exists(): + if not args.username and not args.password and os.path.exists(args.config): username, password = get_nda_credentials_from(args.config) # Otherwise get NDA credentials from user & save them in a new config file, @@ -382,7 +392,7 @@ def make_nda_token(args): # catch another file's exception. if token_call_exit_code is not 0: print("Failed to create NDA token using the username and decrypted " - "password from " + str(Path(args.config).resolve())) + "password from {}.".format(os.path.abspath(args.config))) sys.exit(1) @@ -445,38 +455,66 @@ def make_config_file(config_filepath, username, password): def create_good_and_bad_series_table(cli_args): """ Create good_and_bad_series_table.csv by merging imaging data with QC data - :param cli_args: argparse namespace containing all CLI arguments. This - function only uses the --image03 argument, the path to the spreadsheet with - imaging data. + :param cli_args: argparse namespace containing all CLI arguments. :return: N/A """ - # Import spreadsheets with QC and image03 data - with open(SPREADSHEET_QC) as qc_file: # has img03_id & file_source - all_qc_data = pd.read_csv(qc_file, encoding="utf-8-sig", sep=",|\t", - engine="python", index_col=False, skiprows=[1]) - with open(cli_args.image03) as img3_file: # has image03_id & image_file - image03_data = pd.read_csv(img3_file, encoding="utf-8-sig", sep=",|\t", - engine="python", index_col=False, skiprows=[1]) - qc_data = all_qc_data.loc[all_qc_data["ftq_usable"] == 1] - image03_filtered = image03_data[image03_data["image_file"].isin( - qc_data["file_source"].tolist())].dropna(axis="columns", how="all") - - # Combine the fast_QC and filtered image03 spreadsheets - merged = image03_filtered.merge( - qc_data, left_on="image_file", right_on="file_source", how="inner", - suffixes=("", "_drop") - ) + with open(cli_args.qc) as qc_file: + all_qc_data = pd.read_csv( + qc_file, encoding="utf-8-sig", sep=",|\t", engine="python", + index_col=False, header=0, skiprows=[1], # Skip row 2 (description) + usecols=lambda x: x != "ftq_notes" # Skip unneeded column w/ commas + ) + qc_data = fix_split_col(all_qc_data.loc[all_qc_data["ftq_usable"] == 1]) + + def get_img_desc(row): + """ + :param row: pandas.Series with a column called "ftq_series_id" + :return: String with the image_description of that row + """ + return row.ftq_series_id.split("_")[2] + + # Add missing column by splitting data from other column + image_desc_col = qc_data.apply(get_img_desc, axis=1) + qc_data = qc_data.assign(**{"image_description": image_desc_col.values}) + + # Change column names for good_bad_series_parser to use; then save to .csv + qc_data.rename({ + "ftq_usable": "QC", "subjectkey": "pGUID", "visit": "EventName", + "abcd_compliant": "ABCD_Compliant", "interview_age": "SeriesTime", + "comments_misc": "SeriesDescription", "file_source": "image_file" + }, axis="columns").to_csv(SPREADSHEET_DOWNLOAD, index=False) + + +def fix_split_col(qc_df): + """ + Because qc_df's ftq_notes column contains values with commas, it is split + into multiple columns on import. This function puts them back together. + :param qc_df: pandas.DataFrame with all QC data + :return: pandas.DataFrame which is qc_df, but with the last column(s) fixed + """ - # Remove duplicate columns, including those with different names; rename - # some columns to work with good_bad_series_parser; save to .csv file - merged.loc[:, ~merged.columns.duplicated()].drop( - ["file_source", "image03_id"] + list(merged.filter(regex="_drop$")), - axis="columns").dropna(axis="columns", how="all").rename({ - "ftq_usable": "QC", "subjectkey": "pGUID", "visit": "EventName", - "abcd_compliant": "ABCD_Compliant", "scanner_manufacturer_pd": - "Manufacturer", "comments_misc": "SeriesDescription", - "interview_age": "SeriesTime" - }, axis="columns").to_csv(SPREADSHEET_MERGED, index=False) + def trim_end_columns(row): + """ + Local function to check for extra columns in a row, and fix them + :param row: pandas.Series which is one row in the QC DataFrame + :param columns: List of strings where each is the name of a column in + the QC DataFrame, in order + :return: N/A + """ + ix = int(row.name) + if not pd.isna(qc_df.at[ix, columns[-1]]): + qc_df.at[ix, columns[-2]] = qc_df.at[ix, columns[-1]] + + # Keep checking and dropping the last column of qc_df until it's valid + columns = qc_df.columns.values.tolist() + last_col = columns[-1] + while any(qc_df[last_col].isna()): + qc_df.apply(trim_end_columns, axis="columns") + print("Dropping '{}' column because it has NaNs".format(last_col)) + qc_df = qc_df.drop(last_col, axis="columns") + columns = qc_df.columns.values.tolist() + last_col = columns[-1] + return qc_df def download_nda_data(cli_args): @@ -494,21 +532,17 @@ def download_nda_data(cli_args): def unpack_and_setup(args): """ Run unpack_and_setup.sh script repeatedly to unpack and setup the newly - downloaded NDA data files. + downloaded NDA data files (every .tgz file descendant of the NDA data dir) :param args: All arguments entered by the user from the command line. The specific arguments used by this function are fsl_dir, mre_dir, --output, --download, --temp, and --remove. :return: N/A """ - # Get name of NDA data folder newly downloaded from download_nda_data - download_folder = Path(args.download) - - # Unpack and setup every .tgz file descendant of the NDA data folder - for subject in download_folder.iterdir(): + for subject in os.scandir(args.download): if subject.is_dir(): - for session_dir in subject.iterdir(): + for session_dir in os.scandir(subject.path): if session_dir.is_dir(): - for tgz in session_dir.iterdir(): + for tgz in os.scandir(session_dir.path): if tgz: # Get session ID from some (arbitrary) .tgz file in @@ -520,7 +554,7 @@ def unpack_and_setup(args): UNPACK_AND_SETUP, subject.name, "ses-" + session_name, - str(session_dir.resolve()), + session_dir.path, args.output, args.temp, args.fsl_dir, @@ -531,8 +565,8 @@ def unpack_and_setup(args): # files for each subject after that subject's data # has been converted and copied if args.remove: - shutil.rmtree(args.download + subject.name) - + shutil.rmtree(os.path.join(args.download, + subject.name)) break @@ -546,6 +580,16 @@ def correct_jsons(cli_args): """ subprocess.check_call((CORRECT_JSONS, cli_args.output)) + # Remove the .json files added to each subject's output directory by + # sefm_eval_and_json_editor.py, and the vol*.nii.gz files + sub_dirs = os.path.join(cli_args.output, "sub*") + for json_path in iglob(os.path.join(sub_dirs, "*.json")): + print("Removing {}".format(json_path)) + os.remove(json_path) + for vol_file in iglob(os.path.join(sub_dirs, "ses*", "fmap", "vol*.nii.gz")): + print("Removing {}".format(vol_file)) + os.remove(vol_file) + def validate_bids(cli_args): """ @@ -557,8 +601,8 @@ def validate_bids(cli_args): """ try: subprocess.check_call(("docker", "run", "-ti", "--rm", "-v", - os.path.abspath(cli_args.output) + ":/data:ro", - "bids/validator", "/data")) + cli_args.output + ":/data:ro", "bids/validator", + "/data")) except subprocess.CalledProcessError: print("Error: BIDS validation failed.") diff --git a/src/good_bad_series_parser.py b/src/good_bad_series_parser.py index 24557e0..5d4bf90 100755 --- a/src/good_bad_series_parser.py +++ b/src/good_bad_series_parser.py @@ -17,9 +17,9 @@ # Logging variables num_sub_visits = 0 -num_siemens = 0 -num_ge = 0 -num_philips = 0 +# num_siemens = 0 +# num_ge = 0 +# num_philips = 0 num_rsfmri = 0 num_sst = 0 num_mid = 0 @@ -61,16 +61,6 @@ # TODO: Add pGUID and EventName (Subject ID and Visit) to csv for logging information num_sub_visits += 1 - scanner = group.iloc[0]['Manufacturer'] - if scanner == 'Philips Medical Systems': - num_philips += 1 - elif scanner == 'GE MEDICAL SYSTEMS': - num_ge += 1 - elif scanner == 'SIEMENS': - num_siemens += 1 - else: - print("Unexpected scanner type: %s" % scanner) - # TODO: Create tgz directory if it doesn't already exist sub_id = name[0] visit = name[1] @@ -214,9 +204,9 @@ print("There are %s subject visits" % num_sub_visits) print("%s are valid. %s are invalid" % (num_valid, num_invalid)) -print("%s Siemens" % num_siemens) -print("%s Philips" % num_philips) -print("%s GE" % num_ge) +# print("%s Siemens" % num_siemens) +# print("%s Philips" % num_philips) +# print("%s GE" % num_ge) print("number of valid subjects with a T2 : %s" % num_t2) print("number of valid subjects with rest : %s" % num_rsfmri) print("number of valid subjects with mid : %s" % num_mid) diff --git a/src/sefm_eval_and_json_editor.py b/src/sefm_eval_and_json_editor.py index 8ba9766..6320ce4 100755 --- a/src/sefm_eval_and_json_editor.py +++ b/src/sefm_eval_and_json_editor.py @@ -253,6 +253,13 @@ def generate_parser(parser=None): '-v', '--version', action='version', version=last_modified, help="Return script's last modified date." ) + + # Added by Greg Conan 2019-11-04 + parser.add_argument( + '-o', '--output_dir', default='./data/', + help=('Directory where necessary .json files live, including ' + 'dataset_description.json') + ) return parser @@ -267,10 +274,15 @@ def main(argv=sys.argv): # for this script's usage of FSL_DIR... fsl_dir = args.fsl_dir + '/bin' + # This block was added by Greg Conan 2019-10-25 + for json_file in os.scandir(args.output_dir): + json_path = json_file.path + if "json" in json_path: + shutil.copy2(json_path, args.bids_dir) + # Load the bids layout layout = BIDSLayout(args.bids_dir) subsess = read_bids_layout(layout, subject_list=args.subject_list, collect_on_subject=args.collect) - print(subsess) for subject,sessions in subsess: # fmap directory = base dir @@ -297,24 +309,30 @@ def main(argv=sys.argv): TX_json = TX.replace('.nii.gz', '.json') TX_metadata = layout.get_metadata(TX) #if 'T1' in TX_metadata['SeriesDescription']: + + """ if 'Philips' in TX_metadata['Manufacturer']: insert_edit_json(TX_json, 'DwellTime', 0.00062771) if 'GE' in TX_metadata['Manufacturer']: insert_edit_json(TX_json, 'DwellTime', 0.000536) if 'Siemens' in TX_metadata['Manufacturer']: insert_edit_json(TX_json, 'DwellTime', 0.00051001152626) + """ # add EffectiveEchoSpacing if it doesn't already exist fmap = layout.get(subject=subject, session=sessions, modality='fmap', extensions='.nii.gz') for sefm in [x.filename for x in fmap]: sefm_json = sefm.replace('.nii.gz', '.json') sefm_metadata = layout.get_metadata(sefm) + + """ if 'Philips' in sefm_metadata['Manufacturer']: insert_edit_json(sefm_json, 'EffectiveEchoSpacing', 0.00062771) if 'GE' in sefm_metadata['Manufacturer']: insert_edit_json(sefm_json, 'EffectiveEchoSpacing', 0.000536) if 'Siemens' in sefm_metadata['Manufacturer']: insert_edit_json(sefm_json, 'EffectiveEchoSpacing', 0.00051001152626) + """ # PE direction vs axis func = layout.get(subject=subject, session=sessions, modality='func', extensions='.nii.gz') @@ -333,4 +351,4 @@ def main(argv=sys.argv): if __name__ == "__main__": - sys.exit(main()) \ No newline at end of file + sys.exit(main()) diff --git a/src/unpack_and_setup.sh b/src/unpack_and_setup.sh index 56820dc..33de7e1 100755 --- a/src/unpack_and_setup.sh +++ b/src/unpack_and_setup.sh @@ -45,6 +45,7 @@ TGZDIR=$3 # Path to directory containing all .tgz for this subject's session participant=`echo ${SUB} | sed 's|sub-||'` session=`echo ${VISIT} | sed 's|ses-||'` +echo "ScratchSpaceDir=${ScratchSpaceDir}, ROOT_BIDSINPUT=${ROOT_BIDSINPUT}"; date hostname @@ -60,7 +61,6 @@ RandomHash=`cat /dev/urandom | tr -cd 'a-f0-9' | head -c 16` TempSubjectDir=${ScratchSpaceDir}/${RandomHash} mkdir -p ${TempSubjectDir} # chown :fnl_lab ${TempSubjectDir} || true -echo "TempSubjectDir = ${TempSubjectDir}" # copy all tgz to the scratch space dir echo `date`" :COPYING TGZs TO SCRATCH: ${TempSubjectDir}" @@ -102,7 +102,9 @@ fi # select best fieldmap and update sidecar jsons echo `date`" :RUNNING SEFM SELECTION AND EDITING SIDECAR JSONS" -./src/sefm_eval_and_json_editor.py ${TempSubjectDir}/BIDS_unprocessed/${SUB} ${FSL_DIR} ${MRE_DIR} --participant-label=${participant} +if [ -d ${TempSubjectDir}/BIDS_unprocessed/${SUB}/${VISIT}/fmap ]; then + ./src/sefm_eval_and_json_editor.py ${TempSubjectDir}/BIDS_unprocessed/${SUB} ${FSL_DIR} ${MRE_DIR} --participant-label=${participant} --output_dir $ROOT_BIDSINPUT +fi rm ${TempSubjectDir}/BIDS_unprocessed/${SUB}/ses-baselineYear1Arm1/fmap/*dir-both* 2> /dev/null || true