Skip to content

tianyuan-lu/SclerodermaMethylation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SclerodermaMethylation

We conducted a pilot study on DNA methylation in Systemic Sclerosis (SSc). In this cohort, we recruited nine SSc cases and nine controls. We performed whole-genome bisulfite sequencing (WGBS) and profiled genome-wide DNA cytosine methylation landscape for each sample.

In this repository, we harbor de-identified CpG methylation beta values of all samples genome-wide (MethylationBetaValues_chr*.RData). We separated the large dataset into per-chromosome subsets. Each of the subset contains information of all valid CpG dinucleotides in all samples passing quality control of read depth on the corresponding chromosome. Due to the excessively large size of CHG/CHH methylation data, we do not store them here. However, requisition for academic purpose is possible by contacting us.

We also harbor the de-identified demographic features (Covariates.RData). Samples are in the same order as all CpG methylation data.

We provide an R program which enables visualization of methylation pattern across samples in any user-specified regions (methylationShowcase.R). This program requires a recent version of R and R packages argparse, dplyr and ComplexHeatmap.

After downloading the script, demographic covariates and WGBS data of chromosomes of interest, or cloning this repository locally, users can retrieve usage of this program by executing from command line:

Rscript methylationShowcase.R -h
usage: ./methylationShowcase.R [-h] [-c chromosome] [-s startcoord]
                               [-e endcoord] [-f SUBSET]
                               [-t CATEGORY [CATEGORY ...]]
                               [-m allowedDiseaseMissing]
                               [-n allowedControlMissing]
                               [-r REGRESSOUT [REGRESSOUT ...]] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -c chromosome, --chr chromosome
                        Chromosome number: 1-22 or X; Default: 1
  -s startcoord, --start startcoord
                        Coordinate for region start site; Default: 10000
  -e endcoord, --end endcoord
                        Coordinate for region stop site; Default: 20000
  -f SUBSET, --subset SUBSET
                        Use All samples or only Female samples; Default: All
  -t CATEGORY [CATEGORY ...], --category CATEGORY [CATEGORY ...]
                        Use all samples or disease subtype(s): choose from
                        Diffuse/Limited/Healthy; Default: All
  -m allowedDiseaseMissing, --diseaseCoverage allowedDiseaseMissing
                        Allowed maximum missingness in cases; Default: 0
  -n allowedControlMissing, --controlCoverage allowedControlMissing
                        Allowed maximum missingness in controls; Default: 0
  -r REGRESSOUT [REGRESSOUT ...], --regressOut REGRESSOUT [REGRESSOUT ...]
                        Regress out one or more fixed effects: choose from
                        Age/Male/Duration/Smoke/Ethnicity/Status where
                        Duration represents the progression duration since SSc
                        onset, Smoke represents smoking history (0/1);
                        Default: null
  -o OUTPUT, --output OUTPUT
                        Output pdf file prefix; Default output: Pattern.pdf

Users may refer to differentially methylated regions (DMRs) and visualize by supplying genomic coordinates accordingly. For example, a DMR was identified in gene FNB3 with range chr19: 8137960 - 8138105. This DMR can then be displayed by executing from command line:

Rscript ./methylationShowcase.R -c 19 -s 8137960 -e 8138105 -f Female -m 3 -n 1 -r Age -o Example

# When making multiple selections, arguments should be separated by space. E.g. ... -t Diffuse Limited -r Age Male Ethnicity ... 

ExamplePlot

In this plot, each column represents one sample (D*: SSc cases; C*: controls) and each row represents one CpG dinucleotide within the provided boundary (inclusive). Samples are ordered based on hierarchical clustering. CpG dinucleotides are in their original sequential order. Fixed effect (of age) has been regressed out from the raw methylation beta values, leaving the residual methylation levels colored in a gradient from blue (relatively hypomethylated) to red (relatively hypermethylated). CpG dinucleotides with low read depth are colored grey. Note that if no fixed effect were to be regressed out, the residual methylation levels would remain the same as the initial methylation beta values.

Besides, we also provide an R script (bumphunterAnalysis.R) for reproducing our results presented in the publication using the R package bumphunter. This script analyzes only female samples (nine cases vs. four controls) and adjusts for age, while modifications should be easily achievable to incorporate more covariates and/or compare between different SSc subtypes. This script can be executed in RStudio and the results would be stored in a list object GenomeSSCres whose 23 factor levels correspond to the 23 (1-22 and X) chromosomes in order. q values and averaged methylation level difference can be obtained subsequently.

For more information, please refer to our publication: https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-019-0602-8

About

WGBS data obtained from a pilot study of SSc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages