-
Notifications
You must be signed in to change notification settings - Fork 32
Running on the cloud
Please note the following when choosing computational resources to run BLAST.
- BLAST performs best when the BLAST database’s sequence data can fit into RAM.
- BLAST must have enough disk space to store the BLAST databases and its results.
- BLAST can run in multiple-threads, so multi-core machines can help speed up its processing.
The following are ways to obtain and run BLAST+ on the cloud:
This is commonly referred to as VM or AMI (Amazon Machine Image). This option loads the NCBI software into a single host with the following features:
- BLAST+, MagicBLAST, EDirect
- Popular NCBI BLAST databases pre-configured
This is currently available in Google Cloud Platform (GCP).
The NCBI provides Docker images in Docker Hub. We strongly recommend you use these (as opposed to compiling/building your own) as these have been built to be lightweight and optimized to pre-fetch and load BLAST databases into RAM.
- BLAST+
- MagicBLAST
- EDirect
- BLAST-workbench (BLAST+, MagicBLAST, and EDirect in a single image)
- SRA toolkit
These have the benefits of ease of integration with other container and orchestrator technologies, as well as workflow languagues, but the data provisioning is up to the user to specify (see Getting BLAST databases).
The costs associated with running BLAST in the cloud as discussed herein primarily involves compute, storage and network traffic. Cloud service providers typically offer free tier (e.g.: GCP, AWS) to help you get started.