- Understand what is a genome browser and what it does
- Be able to index BAM files
- Be able to load reference genome, vcf files, and BAM files into IGV
- Be able to understand the meaning of different tracks and components
- Understand what is a shell script and what it does
- Be able to write and run a shell script
- Be able to combine the variant calling workflow into a single script
- Be able to connect to a remote server via ssh
A genome browser is a graphical interface for displaying information for genomic data. Genome browsers enable users to visualise and browse entire genomes with annotated data, including gene prediction, gene structure, protein, expression, regulation, variation, and comparative analysis.
The Integrative Genomics Viewer (IGV) is a stand-alone browser, which has the advantage of being installed locally and providing fast access. Web-based genome browsers, like Ensembl or the UCSC browser, are slower, but provide more functionality.
It is often instructive to look at your data in a genome browser. Visualisation will allow you to get a "feel" for the data, as well as detecting abnormalities and problems. Also, exploring the data in such a way may give you ideas for further analyses.
1. Index the BAM files
IGV requires that BAM files have an associated index file. We can use index
command from SAMtools.
samtools index [aligned.sorted.bam]
Exercise: please index all 3 samples.
2. Load files into IGV
Next, we can start to visualise our data.
- Open IGV.
- Load our reference genome file
ecoli_rel606.fasta
into IGV using the "Load Genomes from File ..." option under the "Genomes" pull-down menu. - Load our BAM file
SRR2584866.aligned.sorted.bam
using the "Load from File ..." option under the "File" pull-down menu. - Do the same with our VCF file
SRR2584866_final_variants.vcf
.
Your IGV browser should look like:
There should be two tracks: one corresponding to our BAM file and the other for our VCF file.
Try zoom in using the scale at the top-right corner to inspect the variants. It should look like this:
3. Track colours
There are some tracks in the browser show in different colours. They can have different meanings. You can right click the track and change the colour settings.
You can read more about different colour settings here.
4. Import multiple samples
IGV allows you to import multiple alignment together and compare them.
Please load another sample into IGV.
Did you find anything interesting?
For more information about IGV, you can read this page.
A shell script is a text file that allows you to combine your sequence of commands together and run it all at once. If you have the need to run the same commands repeatedly, it is a good idea to consider writing a shell script so you can use it the next time.
We use .sh
extensions to represent shell script.
There are 3 steps included to write and run a shell script:
- Create a new file
- Write the code
- Run the script
1. Create a new file
cd ~/workshops/variant-calling
nano first_script.sh
2. Write the code
We are going to create an interactive conversation for this script.
Please input the following code to the file:
echo "Hello!"
sleep 1
echo "I'm Ubuntu. What's your name?"
sleep 1
read NAME
sleep 1
echo "Hello $NAME, nice to meet you!"
The sleep
command introduces a pause in the execution of a script, the system basically sleep for a period of time before execute the next command.
sleep 5 # pause for 5 seconds
sleep 2m # pause for 2 minutes
sleep 1h30m # pause for 1 hour and 30 minutes
sleep 1d # pause for 1 day
The read
command reads input from a user and assign it to a variable.
3. Run the script
To run the script, first we need to save the script. Then we can run:
sh [/path/to/file]
Exercise: try run first_script.sh
.
Now we have learned all the necessary steps to write and run a shell script. Please use sample SRR2589044
as example, write every step of the workflow and put them into a shell script and run it.
In order to connect to the RSB server, you have to connect to the School Intranet using GlobalProtect VPN or you are connecting via a school EtherPort.
The ssh
command is used for secure remote access to a remote system or server over a network. It provides encrypted communication between the client and the server. The command to connect to a remote server is:
ssh [username@hostname]
Please connect to our school server Wright (the hostname will be provided in the class).
Please set up the conda environment on Wright, Anaconda is installed on server so you don't need to install it.
Write a shell script for the variant calling workflow, from the very beginning quality control until you get the final variants. Use for loop to loop through samples.
Remember when we ran the trimming and filtering step of the workflow, we used for loop to run all our samples at once. I have changed a little bit of the script to make it look neat.
for i in *_1.fastq
do
base=$(basename $i _1.fastq)
fastq1=${base}_1.fastq
fastq2=${base}_2.fastq
trim1=${base}_1.trim.fastq
trim2=${base}_2.trim.fastq
untrim1=${base}_1un.trim.fastq
untrim2=${base}_2un.trim.fastq
trimmomatic PE $fastq1 $fastq2 \
$trim1 $untrim1 \
$trim2 $untrim2 \
SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15
done
- Melbourne Bioinformatics - Introduction to Genome Browsers
- Data Carpentry - Data Wrangling and Processing for Genomics
- TechTarget - shell script
- Guru99 - Shell Scripting Tutorial: How to Create Shell Script in Linux/Unix