This is a two-day course introducing R and Bioconductor for the analysis and comprehension of high-throughput genomic (sequencing, microarray, ...) data. There are no pre-requisites. The course will be offered May 16 and 17; it is open and free of charge to students, staff, and faculty at Roswell Park Cancer Institute or SUNY at Buffalo.
An approximate agenda is:
Overview
- Commands, scripts, and literate documents
- Data input, manipulation, and visualization
- Packages
- Introduction to example data sets
- Getting help
Data input and manipulation
- Input data from text and other files
- Vectors,
data.frame
, and other R data types - Tidying data
Analysis
- Performing basic (and advanced!) statistical analyses
- Working with R classes and methods
Visualization
- Base graphics for quick visualizations
- ggplot2 and other effective ways of visualizing data
Project overview
- Packages, methods, and vignettes
- What you can (and can't!) do: sequence analysis (RNA-seq, ChIP-seq, variants, ...), microarrays, flow cytometery, ...
Getting familiar with common operations
- GenomicRanges for describing genome-scale data
- Annotation resources for mapping between identifiers, assigning pathways, describing genes, and exploring consortium and other genome-scale data.
A typical work flow: RNA-seq differential expression of known genes
-
Introduction to RNA-seq
-
Overview of upstream processing (non-R)
-
From count matrix to differentially expressed genes
- Statistical issues
- Implementation using the DESeq2 package
- Placing results in context
Where to now?
- Improving R skills
- Working with large data
- Getting involved with the community