The panel includes CpG sites known to be associated with factors relevant to lung cancer risk:
- demographics (age, sex, ancestry, educational attainment)
- exposures (smoking history, alcohol consumption, cadmium, lead)
- aging (Hannum and Dundedin PACE/PoAm clocks)
- blood cell types
- cancer risk (lung, breast, colorectal, prostate)
- cardiovascular disease risk (COPD, HDL cholesterol, BMI)
- protein abundance (OSM, EN-RAGE, CXCL9, VEGFA, TGFa, IGFBP1, MMP12, HGF, CRP, EGFR, IL6)
All required R packages can be installed as follows:
Rscript src/install-r-packages.r
Before attempting to compile the final panel, verify that the genomic regions for handling ancestry have been selected, i.e. ancestry/output/sites.csv has been created.
The instructions to create this file can be found in ancestry/readme.md.
-
Add a folder containing documentation and files relevant to the addition. In particular, include one or more csv files providing either genomic regions (chr,start,end columns) or Illumina Beadchip CpG site identifiers (cpg column).
-
Add the csv file(s) to the appropriate list in
src/compile-panel.csv
.
Rscript src/compile-panel.r panel.csv panel-reduced.csv
-
panel.csv
will provide a list of regions with some information about why they were selected (see 'source' and 'details'). -
panel-reduced.csv
will be the same aspanel.csv
but will have merged any overlapping regions.
Genomic coordinates will refer to the hg19 human genome assembly.
After creating the panel, run a few checks to reduce risk of errors. See checks/readme.md for more details.
Twist bioinformatics responded to our request noting that about 20% of our requested regions were difficult to target due to repetitive DNA.
Details can be found here: twist-clarification/readme.md.
Additional coverage statistics we generated can be found here: twist-clarification/output/stats.md.
Quality control analyses and outputs for the probes used in the final panel are described here: twist-panel/readme.md
Outputs can be found here: twist-panel/output/stats.md.