From c755a34398a5369fafe5daa0be8d7d536ae306b6 Mon Sep 17 00:00:00 2001 From: Meeta Mistry Date: Wed, 22 May 2024 21:53:37 -0400 Subject: [PATCH] adding more info on public resoutrces --- lessons/01_data_organization.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lessons/01_data_organization.md b/lessons/01_data_organization.md index a12afce..6eec612 100644 --- a/lessons/01_data_organization.md +++ b/lessons/01_data_organization.md @@ -21,17 +21,20 @@ Cancer is the [second leading cause of death globally](https://www.who.int/healt Image source: Johannessen CM and Boehm JS. Curr, Current Opinions in Systems Biology 2017

-There is a vast amount of genomic data deposited in public repositories which are available to researchers. These resources involve a range of large scale datasets and analysis tools, and require differing levels of computational expertise among users. Access to these resources allows us to: +There is a vast amount of **cancer genomic data deposited in public repositories** which are available to researchers. These resources involve a range of large scale datasets and analysis tools, and require differing levels of computational expertise among users. Access to these resources allows us to: * Obtain data to re-analyze and explore different questions posed from original studies * Compare our results to large cancer databases for variant annotation * Obtain reference datasets for benchmarking of variant calling algorithms ### [Genomics Data Commons](https://portal.gdc.cancer.gov/) +The GDC is a data repository funded by the National Cancer Institute (NCI) which provides researchers with access to genomic and clinical data from cancer patients. There is no original research conducted as part of the GDC; their main purpose is to provide centralized access to the data generated by other projects for broader research use. It contains data submitted by researchers and large scale cancer sequencing projects (such as the TCGA). The datasets go beyond whole genome sequencing, with data from RNA-seq, proteomics, imaging and other modalities. Researchers can access and query the data through the portal using built-in analysis tools, or raw data can be obtained after getting authorized access. ### [The Cancer Genome Atlas(TCGA)](https://www.cancer.gov/ccg/research/genome-sequencing/tcga) +The TCGA is a joint effort between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). The TCGA has profiled and analyzed large numbers of human tumors (across 33 different cancer types) at the DNA, RNA , protein and epigenetic levels. Researchers at the TCGA conduct the original research which includes collecting tumor samples, performing various omics analyses, and analyzing the resulting data. TCGA research has led to many discoveries about the molecular basis of cancer and identified potential biomarkers and therapeutic targets. ### [cBioPortal](https://www.cbioportal.org/) +cBioPortal is a free online resource for exploring, visualizing and analyzing cancer genomics data. ### The ICGC-TCGA DREAM Mutation Calling Challenge