We started the Polygenic Risk Scores Combinations Project in August 2023. In this repository, you will find the code we used to clean our data and perform our analyses. The primary aim of this project is to show that combining polygenic risk scores across multiple genome-wide association studies (GWAS) is adequate at predicting risk for Alzheimer's disease. We calculated polygenic risk scores for 808 individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI) using the Polygenic Risk Scores Knowledge Base (PRSKB) commmand-line interface (CLI). Then, using statistical techniques, we showed that PRS from different GWAS had variable accuracy at predicting disease risk, emphasizing the need for standardized PRS calculation methods before PRS can be used in the clinic.
Tool | Version | Installation |
---|---|---|
Python | 3.11+ | Python |
R | 4.3+ | R |
Jupyter | 1.0.0 | pip install jupyter |
Package | Version | Installation |
---|---|---|
pandas | 2.0.2+ | pip install pandas |
numpy | 1.26.2+ | pip install numpy |
scipy | 1.11.4+ | pip install scipy |
matplotlib | 3.7.1+ | pip install matplotlib |
This software is an accompaniment to the Polygenic Risk Scores Knowledge Base, an online or CLI polygenic risk scores calculator, which contains GWAS summary statistics from the NHGRI-EBI GWAS Catalog. Visit the Polygenic Risk Scores Knowledge Base at: PRSKB
Or clone the GitHub PRSKB repository:
git clone https://github.com/kauwelab/PolyRiskScore.git
To use the PRS Combinations Software:
git clone https://github.com/jmillerlab/PRS_Combinations.git
There is a Jupyter notebook tutorial for how to use the software.
Your input must contain two separate files:
- Tab-separated values (.tsv) file which is an output of the PRSKB. See an example.
- Adnimerge. Comma-separated values (.csv) file, which is a compilation of patient demographics and biomarkers information in ADNI. See an example.
- Dxsum. Comma-separated values (.csv) file, which has the final diagnoses for patients in ADNI.
Data | Disk Space (Megabytes) | Download Time (seconds) |
---|---|---|
adnimerge | 8.17 | 3.0 |
dxsum | 1.54 | 1.0 |
Instructions for downloading ADNIMERGE Package in R
The standard output is a single comma-separated values (.csv) file, but you may choose from the following outputs:
- CSV
- TSV
- XLSX (Excel)
Each step is its own callable function within the Jupyter Notebook.
- Setup config.py file
- Import Packages
- Initialize DataFrames
- Create Filtered DataFrame for Reported Trait
- Find Earliest or Latest Diagnosis for Patient in ADNI
- Merge Three DataFrames to Clean the Data
- Convert Range PRS to Mean of Lower and Upper Bounds
- Drop Genome-Wide Association Studies
- Calculate Means
- Simple Diagnosis for Cases and Controls
- Mann-Whitney U Test
- Chi-Squared Test
- Make Plots
- Save Output
Thank you to authors Hady Sabra, Blake Byer, Leah Moylan, and Justin Miller.
This work is freely available for academic and not-for-profit use. However, commercial use is regulated by © 2024 University of Kentucky. All rights reserved. For more information about commercial use of this product, please contact Justin Miller, Ph.D. ([email protected])