Cancer reporting systems require prepopulating several gigabytes of genomic reference data and provisioning all software pieces, docker containers and configuration.
PCGR eases that, pcgr-deploy
simplifies it futher.
This ansible playbook contains tasks to deploy PCGR into Amazon and OpenStack clouds, with HPC-specific tasks added as a module (mainly NFS mounting).
Tweak files ansible/group_vars/all
and ansible.site.yml
's roles section according to your needs (are you a HPC or AWS user?).
The following lines will install the deployment modules, deploy PCGR and run its built-in example as a validation:
python3 -m venv venv && source venv/bin/activate && pip install ansible
ansible-playbook aws.yaml -e 'ansible_python_interpreter=/usr/bin/python3'
ssh ubuntu@<AWS INSTANCE>
cd /mnt/pcgr
./pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna examples/tumor_sample.COAD.cna.tsv /mnt/pcgr-* output tumor_sample.COAD
This playbook allows for all of them, it has tested on the Australian NCI supercomputing centre Tenjin private cloud.
The only changes needed are on ansible/group_vars/all
as mentioned on the Quickstart and rearranging site.yml
so that it includes the hpc
role after common
and databundle
.Then running the playbook in the following way should deploy PCGR in your (OpenStack?) VM:
ansible-playbook site.yml -e 'ansible_python_interpreter=/usr/bin/python3' -i <YOUR CLUSTER IP/HOSTNAME>,
Alternatively, if you have python3 already installed in your virtual environment, instantiating and deploying to OpenStack is as easy as:
ansible-playbook openstack.yml
Assuming you are employed by the University of Melbourne and running on Tenjin, that's all you need to do ;)
The following script included in ansible
queries AWS's spot history and determines if the
instance we are asking for will be available. For instance, running the script with a 0.08AUD
asking price gives us:
python ~/bin/get_spot_duration.py \
--region ap-southeast-2 \
--product-description 'Linux/UNIX' \
--bids c4.large:0.08
That is 168 hours uptime at that particular asking price for ap-southeast-2c
, that
is ~87% savings at the time of writing this:
$ ./get_spot_duration.sh
Duration Instance Type Availability Zone
168.0 c4.large ap-southeast-2c
108.2 c4.large ap-southeast-2a
15.7 c4.large ap-southeast-2b
Open ended experiment for now, there are some errors that need some attention.
ERROR: package is not a legal parameter in an Ansible task or handler
is a symptom of a too old ansible version (probably 1.9.x). You need Ansible >=2.x to deploy this.