Skip to content

Latest commit

 

History

History
159 lines (117 loc) · 8.78 KB

README.md

File metadata and controls

159 lines (117 loc) · 8.78 KB

Workshop Schedule

NOTE: The Basic Data Skills Introduction to the command-line interface workshop is a prerequisite.

Pre-reading:

Day 1

Time Topic Instructor
9:30 - 10:10 Workshop Introduction Will
10:00 - 11:30 Introduction to Variant Calling Elizabeth
11:30 - 11:50 Project Organization Elizabeth
11:50 - 12:00 Overview of self-learning materials and homework submission Will

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Evaluating Read Quality with FastQC

    Click here for a preview of this lesson
    The first step in many NGS studies is first to evaluate the read qualites that you received from the sequencing facility. A common tool used for handling this analysis is FastQC.

    This lesson will:
    • Implement FastQC to evaluate read qualities
    • Evaluate FASTQC quality metrics

  2. Sequence Read Alignment

    Click here for a preview of this lesson
    Once we have completed our QC on sequence reads we will be aligning the reads to a reference sequence. This alignment step places each read in genomic space and creates the bedrock for calling variants.

    This lesson will:
    • Enumerate difficulties with alignment
    • Create an sbatch script to align reads

  3. Alignment File Processing

    Click here for a preview of this lesson
    Before we can call variants from our alignment files, we need to do some processing to clean up the alignment files. The two major concerns here are organizing (sorting) our alignment files for our analyses and removing duplicates.

    This lesson will:
    • Differentiate between query-sorted and coordinate-sorted alignment files
    • Describe and remove duplicate reads
    • Process a raw SAM file for input into a BAM for GATK

NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word compute in it).

  1. Log in using ssh [email protected] and enter your password (replace the "XX" in the username with the number you were assigned in class).
  2. Once you are on the login node, use srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash to get on a compute node.
  3. Proceed once your command prompt has the word compute in it.
  4. If you log out between lessons (using the exit command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.

II. Complete the exercises:

  • Each lesson above contains exercises; please go through each of them.
  • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 2

Time Topic Instructor
9:30 - 10:00 Self-learning lessons review All
10:00 - 10:30 Alignment File Quality Control Elizabeth
10:30 - 10:40 Break
10:40 - 11:15 Aggregating QC metrics using MultiQC Elizabeth
11:15 - 12:00 Variant Calling Will

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Variant Filtering

    Click here for a preview of this lesson
    Now that we have called our raw variants, we will need to filter our data for only high-quality variant calls. Low-quality variant calls can occur for a variety of reasons that we will explore and we will implement steps to exclude them.

    This lesson will:
    • Filter raw variant calls using FilterMutectCells to reduce errors
    • Remove Low-Complexity Regions from the called variants using SnpSift to further reduce errors
  2. Variant Annotation with SnpEff

    Click here for a preview of this lesson
    With our high-quality variant calls, we would like to know more information about these variants. For example, we might like to know which genes our they are in or how they alter the protein-coding sequence for the genes they are in. In order to do this, we will need to provide annotations for our genes.

    This lesson will:
    • Annotate a VCF file for functional impacts with `SnpEff`
    • Differentiate between an unannotated and annotated VCF file

NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word compute in it). For login instructions, please see above.

II. Complete the exercises:

  • Each lesson above contains exercises; please go through each of them.
  • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 3

Time Topic Instructor
9:30 - 10:00 Self-learning lessons review Elizabeth
10:00 - 10:30 Variant Prioritization with SnpSift Elizabeth
10:30 - 11:00 Exercise (Key) Will
11:00 - 11:30 Visualization in IGV Will
11:30 - 12:00 Q & A (review of Automation) All

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 4

Time Topic Instructor
9:30 - 10:30 Introduction to cBioPortal Dr. Tali Mazor
10:30 - 11:30 cBioPortal Practical Dr. Tali Mazor
11:30 - 11:45 Oncoprint Integration Will
11:45 - 12:00 Wrap up Elizabeth

File Format Reference

Automation Reference

Answer key


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.