Skip to content

Latest commit

 

History

History
32 lines (25 loc) · 1.69 KB

README.md

File metadata and controls

32 lines (25 loc) · 1.69 KB

Scripts for RATTACA LD Pruning Testing

Benjamin B. Johnson, Thiago M. Sanches, Mika H. Okamoto, Khai-Min Nguyen, Clara A. Ortez, Oksana Polesskaya, Abraham A. Palmer

Main Notebook: plot_correlations.ipynb

Includes scripts written in R and Python to generate prediction performance data for various experiments and to visualize the data.

Comparing phenotype prediction performance with different genome subsampling methods including:

  • LD pruning parameters - $r^2$ and window size
  • Number of random SNPs
  • Number of training rats
  • Random vs LD Pruning
  • LD clumping

And graphing:

  • Prediction performance distributions
  • Runtimes

main folder: main prediction pipeline and correlation plots
full_pred_pipeline.r: code for general prediction pipeline
plot_correlations.ipynb: correlation graphs from various experiments
plot_runtimes_py.ipynb: runtime graphs from various experiments (to demonstrate cost of different methods)
convex_hull.ipynb: code for testing various rat breeding algorithms for maximizing genetic diversity of offspring

/experiments/code: pipelines for generating performance data for different genome subsampling methods
/experiments/plots: notebooks of plots of different experiments of different genome subsampling methods
/pyrrBLUP: class to run rrBLUP in python, used for scikit-learn comparisons
/experimental: scripts for testing and new ideas (not public)
/old: in-between scripts and old ideas which were refined in other scripts (not public)