This tutorial will get you up and running with Conda and Bioconda.
For an introductory set of slides on Conda and Bioconda see here
- Install Conda via the Miniconda distribution
- Learn about the activation of the base conda environment
- Configure Conda with Software "Channels"
- Learn how to:
- install tools using conda
- remove tools
- install a particular version of a tool
- update tools
- How to handle version conflicts!
- Conda environments
- How to use conda environments
- Create an environment
- Add tools to an environment
- Activate different environments
Instructions:
- Read through the following
- Run the commands that start with
>$
in your SSH session.
For example, the command pwd
will be shown as:
>$ pwd
or
(base) >$ pwd
Don't type in the >$
or the (base) >$
! It's just the representation of the prompt in the tutorial document...
Ok, let's go.
The first thing we have to do is download the installer from the Miniconda documentation website.
SSH Login to your Linux server
Then from your HOME directory:
>$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Now we can install it; we will use the -b
and -s
switches for silent mode and no post install scripts respectively.
>$ sh Miniconda3-latest-Linux-x86_64.sh -b -s
We can watch it install and show the following:
PREFIX=/home/admin/miniconda3
installing: python-3.7.3-h0371630_0 ...
Python 3.7.3
installing: ca-certificates-2019.1.23-0 ...
installing: libgcc-ng-8.2.0-hdf63c60_1 ...
installing: libstdcxx-ng-8.2.0-hdf63c60_1 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-he6710b0_1 ...
installing: openssl-1.1.1b-h7b6447c_1 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-h7b6447c_3 ...
installing: libedit-3.1.20181209-hc058e9b_0 ...
installing: readline-7.0-h7b6447c_5 ...
installing: tk-8.6.8-hbc83047_0 ...
installing: sqlite-3.27.2-h7b6447c_0 ...
installing: asn1crypto-0.24.0-py37_0 ...
installing: certifi-2019.3.9-py37_0 ...
installing: chardet-3.0.4-py37_1 ...
installing: idna-2.8-py37_0 ...
installing: pycosat-0.6.3-py37h14c3975_0 ...
installing: pycparser-2.19-py37_0 ...
installing: pysocks-1.6.8-py37_0 ...
installing: ruamel_yaml-0.15.46-py37h14c3975_0 ...
installing: six-1.12.0-py37_0 ...
installing: cffi-1.12.2-py37h2e261b9_1 ...
installing: setuptools-41.0.0-py37_0 ...
installing: cryptography-2.6.1-py37h1ba5d50_0 ...
installing: wheel-0.33.1-py37_0 ...
installing: pip-19.0.3-py37_0 ...
installing: pyopenssl-19.0.0-py37_0 ...
installing: urllib3-1.24.1-py37_0 ...
installing: requests-2.21.0-py37_0 ...
installing: conda-4.6.14-py37_0 ...
installation finished.
Now that we have installed Conda, we need to be able to use it.
In a very similar way to activating a Python virtual environment, we need to activate conda before we can use it.
We do this by pointing the source
command at the conda activate script.
NOTE: You need to do this EVERY time you log in to your server if you want to use tools you installed with Conda.
>$ source ~/miniconda3/bin/activate
You'll notice that your command line now has an added (base)
in front of it. This lets you know that Conda has been activated and you can use it. Later on, when we start using other conda environments, it will tell us which one we are in!
Now we want to configure Conda with some extra software channels. We especially want to tell Conda where it can find all those excellent Bioinformatics tools we want to use are.
So we need to add three "Channels" to conda's configuration. They are:
- defaults - base packages for Conda
- bioconda - >8000 Bioinformatics tools and growing daily
- conda-forge - has most of the dependencies for all our favourite tools
We add the channels with the conda config
command. The order in which we add the channels is critical. It will determine the order in which conda will search them for the appropriate packages. It's strange, but the last one we add will have the highest priority. We want conda-forge
to have the highest priority, so we add it last...
(base) >$ conda config --add channels defaults
(base) >$ conda config --add channels bioconda
(base) >$ conda config --add channels conda-forge
To check that it worked you can look at the conda configuration using:
(base) >$ conda config --show
It will print out the full configuration of your conda install. If you see:
channels:
- conda-forge
- bioconda
- defaults
amongst the rest of the output, you are all set!
You can discover if a tool you are interested in is available as a conda package by executing conda search
. For example, if you want to look for the seqtk tool, you can run:
$ conda search seqtk
Ok, enough setup already! Lets install a tool cause I have project deadlines!
Lets install BWA
.. It's pretty useful..
To do that we use the conda install <package-name>
command.
Conda will work out all the things it needs to install as well as samtools to make sure it works.
You'll see a whole lot of stuff, but then conda will ask you if you REALLY want to install bwa
. Please take note of all the other things it has to install. Look closely, sometimes it may tell you it has to REMOVE things to be able to install what you want due to an incompatibility.
Woohoo!
Ok so now we can install BWA
with the following:
(base) >$ conda install bwa
Now let's check this one.
(base) >$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
Contact: Heng Li <[email protected]>
Usage: bwa <command> [options]
Command:
index index sequences in the FASTA format
mem BWA-MEM algorithm
fastmap identify super-maximal exact matches
pemerge merge overlapping paired ends (EXPERIMENTAL)
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA-SW for long queries
shm manage indices in shared memory
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
Note: To use BWA, you need to first index the genome with `bwa index'.
There are three alignment algorithms in BWA: `mem', `bwasw', and
`aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
first. Please `man ./bwa.1' for the manual.
To remove a tool is simple. Just use the conda remove
instruction.
(base) >$ conda remove bwa
You'll have to acknowledge that you actually want to remove samtools.
Now try to run bwa.
(base) >$ bwa
Command 'bwa' not found
It's not there anymore...Cool.
Last time we installed samtools we got version 1.9, but we wanted version 1.8. We can install version 1.8 pretty easily. We tell Conda what version we want when we install it.
(base) >$ conda install samtools==1.8
notice the ==1.8
? This tells Conda to install a particular version and in this case 1.8.
This is very handy sometimes.
When you install this version, notice that Conda has to re-jig some of it's other installed programs? This happens a lot, and it could affect other things you might have installed. That's where conda environments come in, and we'll talk about them soon.
For now, run samtools --version
again.
(base) >$ samtools --version
samtools 1.8
Using htslib 1.8
Copyright (C) 2018 Genome Research Ltd.
Cool.
Sometimes we want to upgrade a tool that is already installed. We can do this by using the conda update <package-name>
command.
Lets update samtools to it's the latest version.
(base) >$ conda update samtools
Once it's done, check it's version again.
(base) >$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
Simples.
It's pretty straightforward; we use the conda list
command. It will tell you what is installed, which versions, build numbers and which channel it came from.
(base) >$ conda list
# packages in environment at /home/ubuntu/miniconda3:
#
# Name Version Build Channel
asn1crypto 0.24.0 py27_1003 conda-forge
bwa 0.7.17 h84994c4_5 bioconda
bzip2 1.0.6 h14c3975_1002 conda-forge
ca-certificates 2019.3.9 hecc5488_0 conda-forge
certifi 2019.3.9 py27_0 conda-forge
cffi 1.12.3 py27h8022711_0 conda-forge
chardet 3.0.4 py27_1003 conda-forge
conda 4.6.14 py27_0 conda-forge
cryptography 2.6.1 py27h72c5cf5_0 conda-forge
curl 7.64.1 hbc83047_0
enum34 1.1.6 py27_1001 conda-forge
futures 3.2.0 py27_1000 conda-forge
idna 2.8 py27_1000 conda-forge
ipaddress 1.0.22 py_1 conda-forge
krb5 1.16.1 h173b8e3_7
libcurl 7.64.1 h20c2e04_0
libdeflate 1.0 h14c3975_1 bioconda
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 8.2.0 hdf63c60_1
ncurses 6.1 hf484d3e_1002 conda-forge
openssl 1.1.1b h14c3975_1 conda-forge
perl 5.26.2 h516909a_1006 conda-forge
pip 19.1 py27_0 conda-forge
pycosat 0.6.3 py27h14c3975_1001 conda-forge
pycparser 2.19 py27_1 conda-forge
pyopenssl 19.0.0 py27_0 conda-forge
pysocks 1.7.0 py27_0 conda-forge
python 2.7.15 h721da81_1008 conda-forge
readline 7.0 hf8c457e_1001 conda-forge
requests 2.22.0 py27_0 conda-forge
ruamel_yaml 0.15.71 py27h14c3975_1000 conda-forge
samtools 1.9 h8571acd_11 bioconda
setuptools 41.0.1 py27_0 conda-forge
six 1.12.0 py27_1000 conda-forge
sqlite 3.26.0 h7b6447c_0
tk 8.6.9 h84994c4_1001 conda-forge
urllib3 1.24.3 py27_0 conda-forge
wheel 0.33.4 py27_0 conda-forge
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
Just say you need two versions of samtools installed. Samtools is used a lot as a dependency in a lot of other tools, and they sometimes need particular versions. What if you want to use tool-a
which has a dependency for samtools 1.8
and you also want to use tool-b
which needs samtools 1.9
. What do you do? You can't have samtools 1.8 & 1.9 at the same time can you?
Yes, you can! You need to install tool-a
and tool-b
in different and separate environments with their own set of dependencies that do not interact with one another!
To create a conda environment, we use the conda create
command. We will now look at how to use environments!
Let's install a tool called mlst
into its environment. It has a lot of dependencies and is quite a complex tool.
The first thing we need to do is create an environment for it.
We can create an environment for mlst as follows:
(base) >$ conda create -n mlst_env
This will create an environment space called mlst_env
that we can now activate and install tools into.
Before we can use our new environment, we have to activate it. We use the activate command.
If we have already activated a conda environment (including the base
environment), we just have to use the conda activate <environment-name>
command.
However, if we haven't activated Conda yet and we know the name of the environment we want to activate then we have to source it just like we did for the original (base) environment. source ~/miniconda3/bin/activate <environment-name>
.
We have already got Conda activated, so we switch to mlst_env
as follows:
(base) >$ conda activate mlst_env
Immediately, you'll notice that the prompt has changed and now we have it prepended with (mlst_env)
instead of (base)
.
NOTE: You can only have one environment activated at a time (for each SSH session.)
If we run conda list
now it should be empty..
(mlst_env) >$ conda list
# packages in environment at /home/ubuntu/miniconda3/envs/mlst_env:
#
# Name Version Build Channel
Now that we have activated our environment, we can install tools into it.
(mlst_env) >$ conda install mlst
It will take quite a long time to install it as it has a LOT of dependencies.
But it will happen, and when it's finished, we can check it out.
(mlst_env) >$ mlst --help
SYNOPSIS
Automatic MLST calling from assembled contigs
USAGE
% mlst --list # list known schemes
% mlst [options] <contigs.{fasta,gbk,embl}[.gz] # auto-detect scheme
% mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
GENERAL
--help This help
--version Print version and exit(default ON)
--check Just check dependencies and exit (default OFF)
--quiet Quiet - no stderr output (default OFF)
--threads [N] Number of BLAST threads (suggest GNU Parallel instead) (default '1')
--debug Verbose debug output to stderr (default OFF)
SCHEME
--scheme [X] Don't autodetect, force this scheme on all inputs (default '')
--list List available MLST scheme names (default OFF)
--longlist List allelles for all MLST schemes (default OFF)
--exclude [X] Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
OUTPUT
--csv Output CSV instead of TSV (default OFF)
--json [X] Also write results to this file in JSON format (default '')
--label [X] Replace FILE with this name instead (default '')
--nopath Strip filename paths from FILE column (default OFF)
--novel [X] Save novel alleles to this FASTA file (default '')
--legacy Use old legacy output with allele header row (requires --scheme) (default OFF)
SCORING
--minid [n.n] DNA %identity of full allelle to consider 'similar' [~] (default '95')
--mincov [n.n] DNA %cov to report partial allele at all [?] (default '10')
--minscore [n.n] Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
PATHS
--blastdb [X] BLAST database (default '/home/ubuntu/miniconda3/envs/abricate_env/bin/../db/blast/mlst.fa')
--datadir [X] PubMLST data (default '/home/ubuntu/miniconda3/envs/abricate_env/bin/../db/pubmlst')
HOMEPAGE
https://github.com/tseemann/mlst - Torsten Seemann
We can manipulate what is installed in this environment, in the same manner, we did for the (base)
environment.
The above was a few too many steps, and as computer scientists, we like typing less. So...
Let us create a new environment for version 1.8 of samtools as well as BWA and install them too.
We can do it all with one command.
First, we need to exit the (mlst_env)
environment and go back to the (base)
environment.
(mlst_env) >$ conda deactivate
Now we can create our new environment for samtools and install it.
(base) >$ conda create -n bwa_samtools_1_8 samtools==1.8 bwa
This command will create an environment called bwa_samtools_1_8
and then install samtools version 1.8 and a compatible version of bwa into it.
Cool huh?
Well, that's it. Hopefully, you now understand how to use conda and conda environments to install and use all of your favourite tools!