Skip to content

Alzheimer’s Disease Biomarker Collection Package

Notifications You must be signed in to change notification settings

Thewhey-Brian/ADMerge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdMerge

Alzheimer’s Disease (AD) Biomarker Collection Package.

With sharp increases in AD cases, deaths, and costs stressing the health care system and caregivers, several major AD data sources exist which allow researchers to conduct their research. For example, the BIOCARD study is a longitudinal, observational study initiated in 1995, and designed to identify biomarkers associated with progression from cognitively normal to mild cognitive impairment or dementia; the ADNI study is a multicenter observation study launched in 2004, to collect clinical, imaging, genetic and biospecimen biomarkers from cohorts of different clinical states at baseline; the NACC UDS data is a collection of data reflecting the total enrollment since 2005 across 34 AD Centers and includes subjects with a range of cognitive status.

The ADMerge Package provides a function, ad_merge(), that merges data from various AD data sources to create a analysis dataset.This package establishes AD data standards and data dictionaries that define the formats and organization structures of the AD data across multiple data sources. R functions are provided for data analysts to integrate data from multiple data sources and create their analysis datasets.

Installation

Use the following codes to install the ADMerge package

library(devtools)
install_github("Thewhey-Brian/ADMerge")
library(ADMerge)

For details about how to install a R package directly from GitHub: https://rdrr.io/cran/remotes/man/install_github.html.

Usage

Prerequisites

In order to collect biomarkers for AD, locally access to all biomarker files is needed.

To help understand the significant amount of data in ADNI dataset, we provided ADNI_Tools to provide/generate a reference files dictionary. So one can access the detailed information for all files without downloading them.

Overview Data Structure

Before merging all biomarkers across different files, it is crucial to review the files structure through function get_src_table().

src_table = get_src_table(path_to_biomarker_files)

The data structure table src_table will be one of the inputs for the main merging function.

Inputs:

  • path: The path to the directory containing the data files.
  • FILE_pattern: A regular expression pattern that specifies the file types to include in the source table. The default is ".xlsx|.xls|.csv".
  • ID_pattern: A regular expression pattern that specifies the potential ID variables in the data files. The default is "ID".
  • DATE_pattern: A regular expression pattern that specifies the potential DATE variables in the data files. The default is "DATE|VISITNO".
  • IS_overlap_list: A list of logical values that specifies, when merging, whether overlapping between time windows is allowed (TRUE) or not (FALSE). The length of the list must be equal to the number of files being read. The default is NULL.
  • WINDOW_list: A list of numeric time windows for matching the DATE variables. The length of the list must be equal to the number of files being read. Default is NULL.
  • ID_usr_list: A list of user-specified ID variable names. If provided, the function will try to match the variable names to the potential ID variables in the data files. The default is NULL.
  • DATE_usr_list: A list of user-specified DATE variable names. If provided, the function will try to match the variable names to the potential DATE variables in the data files. The default is NULL.
  • file: A path to a file where the source table will be saved as a CSV file.

Outputs: A table with the following structure:

file. VARS_in_file ID_in_file DATE_in_file ID_for_merge DATE_for_merge IS_overlap WINDOW
CSF_file.csv Phase; ID; RID; SITEID; ... ID; RID; SITEID USERDATE; USERDATE2; EXAMDATE; ID EXAMDATE FALSE 366
IMAGE_file.csv Phase; ID; RID; SITEID; ... ID; RID; SITEID USERDATE; USERDATE2; SCANDATE; ID SCANDATE FALSE 366
DIAGNOSIS_file.csv Phase; ID; RID; SITEID; ... ID; RID; SITEID USERDATE; USERDATE2; ID USERDATE FALSE 366

Modify src_table

There are two ways to modify src_table generated by get_src_table().

  1. Run get_src_table() again with any specified ID_usr_list, DATE_usr_list, IS_overlap_list, WINDOW_list. Note: The length of the list must be equal to the number of files in src_table.
  2. Run get_src_table() again with file specified. This will save src_table as a csv file to the local directory. One can edit this csv file locally and input to the merging function later.

Merging

The merging action is performed by function ad_merge().

ad_data = ad_merge("path_to_biomarker_files", DATE_type = "Date", dict_src = src_table)

Inputs:

  • path: The path to the directory containing the data files.
  • DATE_type: The type of DATE used in the data, either "Date" (e.g. 2017-1-16) or "Number" (e.g. 3 or m48 ...).
  • dict_src: A dataframe containing structual information of the input data files. Default NULL. Fill in if src_table is modified and stored in R environment.
  • dict_src_path: The path to the src_table. Default NULL. Fill in if src_table is modified locally with its csv file.
  • timeline_file: The name of the file containing the timeline for the data. Could be any value in the file column of src_table.
  • timeline_path: The path to the timeline file. Default NULL. This is an alternative option for inputing timeline_file.

Outputs:

  • analysis_data: The merged dataset with all the relevant biomarker information.
  • dict_src: The src_table used for this merging.

Summary Information

s3 functions summary() and plot() are provided to get the summary information about the merged analysis data.

summary(ad_data)
plot(ad_data, distn = "SCF_m1", group = "SEX")

There are several crucial inputs for the plotting function:

  • distn The name of the variable to plot the distribution.
  • group The name of the variable to group and colored in the plot.
  • baseline A boolean indicating whether to include only the baseline data in the plot.

About

Alzheimer’s Disease Biomarker Collection Package

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages