Benchmark tests will be controlled from a local machine, which will interact with remote cloud-based servers over the internet (using SSH for deployment and HTTP for running the tests).
The controlling machine should be UNIX based, i.e. either MacOSX or Linux, mainly because Ansible, which we use to deploy TissueMAPS in the cloud, doesn't run on Windows (see Ansible docs for details).
The controlling machine further needs to have Python installed as well as its package manager pip.
In addition, you need git, OpenSSH, OpenSSL, GCC and time.
Tests are performed by Bash scripts provided via the tmbenchmark repository:
$ git clone https://github.com/tissuemaps/tmbenchmarks ~/tmbenchmarks
These scripts use command line interfaces exposed by the tmdeploy and tmclient Python packages. We recommend installing packages into a separate Python virtual environment:
$ virtualenv ~/.envs/tmbenchmark
$ source ~/.envs/tmbenchmark/bin/activate
$ pip install -r ~/tmbenchmarks/requirements.txt
Benchmarks are based on the image-based transcriptomics data set (Battich et al. 2013). Images are publicly available on figshare:
$ wget
FIXME
Install system packages as root
user:
$ yum update -y
$ yum install -y git gcc epel-release time openssl-devel
$ yum install -y python-devel python-setuptools python-pip python-virtualenv
Install Python packages as non-privilaged user into virtual environment:
$ virtualenv ~/.envs/tmbenchmark
$ source ~/.envs/tmbenchmark/bin/activate
$ pip install ~/tmbenchmark/requirements.txt
The setup
subdirectory of the repository provides setup configuration files to build architectures using the tm_deploy
command line tool.
There are two types of architectures:
* standalone: single-server setup
* cluster: multi-server setup with separate compute, filesytem and database servers (and a monitoring system)
The number indicates the total number of CPU cores that are allocated to TissueMAPS for parallel execution of computational jobs. Note that in case of a standalone
setup, the database servers run on the same host. We therefore use machine flavors with more CPU cores to provide dedicated resources to the database servers to prevent that they compete with computational jobs for resources. In case of a cluster
setup, the database servers reside on separte hosts.
Specify your cloud provider and the cluster architectures which you would like to set up and run the tests against. For example, to build a cluster with 32 CPU cores on ScienceCloud:
$ ~/tmbenchmarks/build.sh -p sciencecloud -c cluster-32
The setup files can be found in ~/tmbenchmark/setup/sciencecloud/
.
Once the required infrastructure has been provisioned and the software has been deployed, you can run the test:
$ ~/tmbenchmarks/upload-and-submit.sh -p sciencecloud -c cluster-32 -H $HOST -d $DATA_DIR
where HOST
is the public IP address of the cloud virtual machine that hosts the TissueMAPS web server and DATA_DIR
is the path to a local directory that contains the microscope files that should be upload.
You can use the tm_inventory
command line tool to list metadata about servers that have been set up in the cloud (including their IP addresses):
$ export TM_SETUP=$HOME/tmbenchmark/setup/sciencecloud/cluster-32.yaml
$ tm_inventory --list
Once the test has completely, you can download the extracted single-cell feature data:
$ ~/tmbenchmark/download-results.sh -p sciencecloud -c cluster-32 -H $HOST -d $DATA_DIR
This will write the results as CSV files into $DATA_DIR/sciencecloud/cluster-32/results
To calculate duration and speedup of workflow processing, you can download the status for computational jobs:
$ download-workflow-status.py -p sciencecloud -c cluster-32 -H $HOST -d $DATA_DIR
This will store the job information in CSV format in $DATA_DIR/sciencecloud/cluster-32_jobs.csv
.
$ download-workflow-status.py -p sciencecloud -c cluster-32 -H $HOST -d $DATA_DIR
This will store the raw metrics as individual files in CSV format in $DATA_DIR/sciencecloud/cluster-32
as a separate subfolder for each step and computed aggregates in CSV format in $DATA_DIR/sciencecloud/cluster-32_metrics.csv
.
The provided scripts will automatically redirect standard output and error to dedicated log files in ~/tmbenchmark/logs
.