-
Notifications
You must be signed in to change notification settings - Fork 5
Computing Environment
This tutorial are conducted in a linux/unix Terminal session. In other words you must already be connected to a remote Linux machine at MGHPCC to continue.
The computer cluster at MGHPCC are built up of hundreds or thousands of Linux machines connected together with a network, shared storage and softwares. To leverage the environment for "big" computing tasks, we need to first learn some special tools, Linux command, shell scripting, and LSF commands, to "communicate" with the cluster.
Command-line examples that you are meant to type into a terminal window will be shown indented in a constant width font, e.g.
echo $USER
Sometimes the accompanying text will include a reference to a Unix command. Any such text will also be in a constant width, boxed font.
e.g. Type the ls
command again.
A Unix/Linux shell is a command-line interpreter which provides a user interface for the Unix/Linux operating system. Users control the operation of a computer by submitting single commands or by submitting one or more commands via a shell script. Whatever you type at the command line is understood and interpreted by a program and then that program gives you an output after executing your command. This program that understands what you type is called the shell. Several common shell choices are available on MGHPCC:
Shell | Description |
---|---|
bash | a Bourne-shell (sh) compatible shell with many newer advanced features as well |
tcsh | an advanced variant on csh with all the features of modern shells |
zsh | an advanced shell which incorprates all the functionality of bash, tcsh, and ksh combined |
csh | the original C-style shell |
The default shell provided to MGHPCC users is the bash shell. To discover your current shell:
echo $SHELL
Environment variables are a set of dynamically named values which can control the way running processes will behave on a computer. Many of the UNIX commands and tools require certain environment variables to be set. Many of these are set automatically for the users when they log in or load applications via the module command. To view your current set of environment variables do the following command:
env
Environment variables provide a way to influence the behaviour of software on the system. For example, the "LANG" environment variable determines the language in which software programs communicate with the user.
Environment variables consist of names that have values assigned to them. For example, on a typical system in the US we would have the value "en_US.UTF-8" assigned to the "LANG" variable.
To print the value of a variable:
echo $<NAME> # Eg. echo $HOME
Some Examples:
$ echo $USER
ml23a
$ echo $HOME
/home/ml23a
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
The last one shows the content of the $PATH
environment variable, which displays a — colon separated — list of directories that are expected to contain programs that you can run. This includes all the the Unix commands, eg. ls
, cd
, pwd
. These are files that live in directories which are run like programs (e.g. ls
is just a special type of file in the /bin
directory).
Knowing how to change your $PATH to include custom directories can be necessary sometimes (e.g. if you install some new bioinformatics software in a non-standard location).
A module manages environment variables needed to load a particular piece of software.
To see a list of modules that are currently loaded:
module list
To see a list of modules that are available to be loaded:
module avail
To see what environment variables would be set/changed if you load a specific module:
module show <module_name>
To load a module:
module load <module_name>
To unload a module:
module unload <module_name>
MGHPCC servers have many bioinformatics programs/ software installed already. The biocluster uses 'modules' to systematically organize, version control, and load software and libraries.
Try the command module
to see all of your available options with the tool.
Try the command module avail
to see all of the loaded modules on the server.
Or, click here for a complete list of available modules and module names.
| Previous Section | This Section | Next Section | |:------------------------------------:|:--------------------------:|:--------------------------------------------:| | Connecting to MGHPCC | Computing Environment | The Unix Shell Bootcamp|
6-iii. Integrated assignment answers
#Table of Contents
- Module 0 Setting Up for Data Analysis
- Introduction to High Performance Computing Cluster
- Connecting to MGHPCC
- Computing Environment
- Unix Tutorial Part 1: UNIX Bootcamp
- Unix Tutorial Part 2: Shell Scripting
- Unix Tutorial Practice
- Submitting computing jobs to HPC using LSF
- Ignore: Git Tutorial
- Module 1 Introduction/ Overview
- Overview of RNA-seq Experiment
- RNA-Seq Analysis Pipeline
- RNA-Seq Input Data
- RNA-seq File Formats and Software-Specific Files
- Getting Data for Analysis
- Module 2 Quality Control
- Module 3 Tuxedo Pipeline
- The Tuxedo Pipeline
- Read Alignment with TopHat2
- Transcript Assembly with Cufflinks
- Differential Analysis with Cuffdiff
- Visualization with CummeRbund
- Resources and Reference