-
Notifications
You must be signed in to change notification settings - Fork 0
Agave apps
##Register and run apps with Agave API
- Registering an App
- App registration
- Additional notes on the JSON file
- Docker integration
- Condor integration
To perform the following steps you need to sign up for a CyVerse account here.
See the FAQ.
Agave is a RESTful API, meaning that we can interact with it using POST
and GET
http requests. The cyverse-cli tools are essentially wrappers around these types of requests to make these requests shorter.
Agave tracks two kinds of resources: Systems and Apps. There are 2 types of Systems: Storage and Execution. Apps run on Execution Systems using data from Storage Systems to produce desired results. So to run an app, we need an Execution system first. Systems are described using JSON files, which are then posted to the API. An Execution System JSON consists of 4 parts, which will be described below.
The first part consists of system basics: id, type etc. See an example below:
Expand source
``` "id" : "myTutorialMachine", "name" : "A machine for the EI Agave tutorial", "type" : "EXECUTION", "executionType" : "CLI", "scheduler" : "FORK", ```
All Execution systems need to define storage as scratch space. For this example, we'll assume you have a scratch directory mounted somewhere on /mnt/
(an SSHFS for example)
Expand source
``` "storage": { "host" : "yourhost.example.org", "port" : 22, "protocol" : "SFTP", "homedir" : "/mnt/scratch/username", "rootdir" : "/mnt/scratch", "auth" : { "type": "PASSWORD", "username" : "username", "password" : "password" }, } ```
All execution systems need a default Queue to which jobs are submitted. In our example, we are using a simple CLI system, so there are no scheduler queues that we need to deal with. This means we can get away with a simple specification like this:
Expand source
``` "queues": [ { "name": "normal", "default": true, "maxRequestedTime": "24:00:00", "maxJobs": 10, "maxUserJobs": 5, "maxNodes": 1, "maxMemoryPerNode": "4GB", "maxProcessorsPerNode": 12, "customDirectives": null } ] ```
Lastly, Agave will need to know information to login to the Execution system. This can be specified using a Login object, which is specified as follows:
Expand source
``` "login": { "host" : "yourhost.example.org", "port" : "22", "protocol": "SSH", "auth" : { "type" : "PASSWORD", "username" : "username", "password" : "changethis" } } ```
Expand source
``` "auth" { "type" : "SSHKEYS", "username" : "username", "publicKey" : "ssh-rsa AAAA...your public key... [email protected]", "privateKey" : "-----BEGIN RSA PRIVATE KEY-----*private key here*-----END RSA PRIVATE KEY-----" } ```
Now that we have defined our system, you can find the completed JSON file below:
Expand source
``` { "id" : "myTutorialMachine", "name" : "A machine for the TGAC Agave tutorial", "type" : "EXECUTION", "executionType" : "CLI", "scheduler" : "FORK",
"storage": { "host" : "yourhost.example.org", "port" : 22, "protocol" : "SFTP", "homedir" : "/mnt/scratch/username", "rootdir" : "/mnt/scratch", "auth" : { "type": "PASSWORD", "username": "username", "password": "changethis" }, },
"queues": [ { "name": "normal", "default": true, "maxRequestedTime": "24:00:00", "maxJobs": 10, "maxUserJobs": 5, "maxNodes": 1, "maxMemoryPerNode": "4GB", "maxProcessorsPerNode": 12, "customDirectives": null } ],
"login": { "host" : "yourhost.example.org", "port" : "22", "protocol": "SSH", "auth" : { "type" : "PASSWORD", "username" : "username", "password" : "changethis" } } }
</p></details>
Let's use it to register the system on Agave:
```systems-addupdate -v -F TutSystem.json```
A large amount of JSON describing our new system will be returned to confirm the registration. Now that we have an execution system, let's move on to registering our workflow as an App in the next part.
#### App registration
An App in the Agave API means a workflow that is wrapped into a single unit which can be executed by a user. It is described in the same way a system is described: JSON. In this part we'll register a test app that runs a simple BLAST job.
##### App JSON - Front matter
The first thing we'll need to describe are some basic parameters of our app:
<details>
<summary>Expand source</summary>
<p>
"name" : "blastapp-tutorial", "label" : "EI tutorial BLAST app", "version" : "0.0.1", "executionType" : "CLI",
</p></details>
The App ID will be generated from the name and version number and this combination *must be unique*. You can delete the previous one (if there was an error), or increase the version number (if you need to make an updated version).
Next, we'll specify where and how the app will run:
<details>
<summary>Expand source</summary>
<p>
"executionSystem" : "myhost.example.org", "deploymentPath" : "username/apps/EI_tutorial", "templatePath" : "wrapper.sh", "testPath" : "test.sh", "parallelism" : "SERIAL",
</p></details>
When specifying an `executionSystem` only like above, *you must make sure your app assets are already present on the system!*. This means that you need admin access to your execution system. Often this is not the case. To remedy this, we can store our apps assets on the CyVerse Datastore and specify a "deploymentSystem" parameter like so:
```"deploymentSystem" : "data.iplantcollaborative.org",```
If you are planning to publish your app with CyVerseUk we'd ask to add the `"ontology"` fields with a list of EDAM URI and the `"tag" : [ "CyverseUK"]`. You can easily see some complete JSON examples in this organization's repositories.
Finally, we'll specify our apps inputs:
<details>
<summary>Expand source</summary>
<p>
"inputs" : [ { "id": "query", "details" : { "label": "Query" , "description": "FASTA file with query sequence(s)" }, "value": { "required" : "true" } }, { "id": "database", "details" : { "label": "Database" , "description": "FASTA file with sequences to search (database)" }, "value": {"required" : "true"} } ], "parameters" : [ ]
</p></details>
We're leaving parameters empty, but we could add any BLAST command line parameters here. Now that we have specified this, we'll have to actually upload our app's assets to CyVerse.
##### Storing App assets with CyVerse
We'll upload data to the datastore using the Discovery Environment (DE), however, the CyVerse datastore uses iRods under the hood, so you could use <a href="https://docs.irods.org/master/icommands/user/">icommands</a> as well. For more details, see the <a href="https://pods.iplantcollaborative.org/wiki/display/DS/Using+iCommands">CyVerse wiki</a>.
First, login to the DE at [https://de.iplantcollaborative.org/]. You'll be presented with a desktop like environment. Click on the "Data" button. This will open up a file manager window, with a file tree on the left hand side. Here, click on the folder with your username (at the top). We'll create a new folder to hold our apps first. Go to "File" and select "New Folder...". Name the new folder "EI_tutorial" and click "OK" to confirm. Navigate to our newly created folder by clicking on it. This is where our app's assets will live, which we'll create in the next sections.
To develop an app on CyVerseUK system it would be a good idea to make the assets live in our systems.
##### Creating the App assets
For our minimal BLAST app, we'll need three files: a wrapper script, a test script and an executable. Because of the way BLAST works, we'll actually need two executables for this app. First we'll create the wrapper script:
<details>
<summary>Expand source</summary>
<p>
#!/bin/bash
QUERY="${query}" DATABASE="${database}"
chmod u+x lib/makeblastdb chmod u+x lib/blastn
lib/makeblastdb -dbtype nucl -in $DATABASE -out db lib/blastn -query $QUERY -db db
return $!;
</p></details>
As you can see from the first line, this is a plain bash script that runs our pipeline. The next two lines set up our main two parameters: the query and the database. The `${query}` directive will be replaced BEFORE execution of the script by Agave to the inputs we have given. Note that the word query is the id we specified in our JSON file earlier. The next line does the same for the database file.
The next two lines run our actual BLAST 'pipeline': first we create our database with makeblastdb and we then execute the BLAST with blastn. The `lib/` part of the command line is because of the way we will set up our app assets; Agave convention requires that all our App's executables are stored in a separate lib directory. We output the database in the first line with a simple title of db, and we call that database again in the next line.
The last line returns the current exit status, which will be inherited from the status of BLAST; this means that the script will pass on BLAST's exit value as its own.
Next, we'll need a test script that test our app with some default data. This is useful, but we'll skip this for now as it is a bit out of this tutorials' scope. Instead, we'll just write a script that returns true and call it done:
#!/bin/bash return true
Finally we'll need to provide the BLAST executables. These can be obtained from the NCBI ftp server.
Now that we have everything, let's get our assets setup in the datastore. Go back to your DE window, and go the the #EI_tutorial folder under your username (if you weren't already there). Create a folder called lib, and navigate to it. We'll put our BLAST executables here. Go to the "Upload" menu on the top left-hand corner of the file navigation window. The easiest way is to upload the executables from this repo directly, so choose "Import from URL...".
The wrapper script should perform all the checks that the Agave API doesn't support (mutually inclusive or exclusive parameters for example), and ideally return the proper error before running the Docker container. It may be useful to use the wrapper script to delete any new files that is not needed from the working directory, to avoid them to be archived.
In our case there is some additional logic in the wrapper scripts to allow some automatic tasks in the virtual machines to perform as expected and to integrate the system with the webapp (<a href="http://cyverseuk.herokuapp.com/">CyVerseUk</a>). You don't usually have to worry about this.
##### Registering App in Agave
Now that our assets are in place, we can register our app in Agave using the JSON file we wrote earlier. (If needed, refresh your access tokens with `auth-tokens-refresh`). Navigate to where the file is stored (we'll assume you've named it TutApp.json) and run the `apps-addupdate` command:
```apps-addupdate -v -F TutApp.json```
A bunch of JSON describing your app will be returned, confirming the registration of our app.
*****************
#####<div id="json">Additional notes on the JSON file</div>
Following the introductory part the JSON file lists inputs and parameters. A good documentation about the available fields and their usage can be found <a href=http://agaveapi.co/documentation/tutorials/app-management-tutorial/app-inputs-and-parameters-tutorial/>here</a>.
For the application (if you wish to publish it) to display a proper information window in the Discovery Environment, the following fields need to be present in the JSON file: `help_URI`, `datePublished`, `author`, `longDescription`.
In the `ontology` field a list of IRI for topic and operation branches of the <a href=http://www.ebi.ac.uk/ols/ontologies/edam>EDAM ontology</a> has to be specified to properly categorize the App.
May you encounter some problems registering your application, I'd suggest first checking the JSON file is valid. A good way to do this is to copy-paste it to <a href="https://togo.agaveapi.co/app/#/apps/new">AgaveToGo</a>.
If `details.showArgument` (boolean) is set to `true`, it will pass `details.argument` before the value (e.g. if we want to pass to command line `--kmer 31`). Note that the argument is put before the value without spaces (so usually we want to add one in the string!!).
`value.validator` can supply a check on the format of the submitted value as a <a href=http://perldoc.perl.org/perlre.html>perl formatted regular expression</a>. (**pay particular attention to the escapes**)
Example case: JSON `value.type` doesn't provide a distinction between integers and floating point, but just `number`. To check the input is an integer we may use `"validator": "^\\d*$"` (or `"^[0-9]+$"` to avoid the escapes). The same field also allow to accept just even/odd numbers, set a maximum value, etc.
*Also note that it may be useful to define numerical variables as strings providing the right validator if we don't want to define a default value, because both the Discovery Environment and the CyVerseUk web interface will pass 0 otherwise.*
We usually don't want the user to work in a folder that is not the set working directory, so if the program run by the App has a `--output_directory` option (or similar) we may want to add a validator to be sure that the string doesn't start with '/', or just hide it and give a default name (e.g. `output`, this will also make the wrapper script easier to write and maintain).
**IMPORTANT**:
```json
"value": {
...
"visible": false,
"default": "default_value",
...
}
is NOT supported. The default value must be provided in the wrapper script if we don't want the user to be able to change it.
####Docker integration
It's not possible to run an App in CyVerse interactively. Therefore to run multiple commands in a Docker container we need the following syntax in the wrapper.sh
script:
docker run <image_name[:tag]> /bin/bash -c "command1;command2...;".
/bin/bash
is not strictly necessary, but, depending on the base image, bash may not be the default shell: adding it to command line takes care of this problem.
IMPORTANT UPDATE: in Docker version 1.12 the SHELL
instruction was added. This allows the default shell used for the shell form of commands to be overridden (at build time too-so it may make the built a bit slower). Use it as follows:
SHELL ["/bin/bash", "-c"]
####
The HPC on CyVerseUk infrastructure is using HTCondor scheduler, so the wrapper.sh
is not enough to run the app, but a HTCondorSubmit.htc
script is needed as well.
The HTCondorSubmit.htc file will be in the following form:
Expand source
``` universe = docker docker_image = [:tag] executable = should_transfer_files = YES arguments = transfer_input_files = transfer_output_files = when_to_transfer_output = ON_EXIT request_memory = 100G ```
###
The App, after being made public with (this step has to be performed by a tenant admin, so please contact them if you have a ready-to-publish application):
apps-pems-update -v -u <username> -p ALL <app_name>-<version>
can be found both in the DE, under Apps>High-Performance Computing, and in the CyVerseUk web interface. The App interface is automatically generated based on the submitted JSON file.
###
Finally, we can run our App! We'll need one more (short) JSON file to run a new job:
Expand source
``` { "name" : "blasttest", "appId" : "blastapp-tutorial-0.0.1", "archive" : "true", "inputs": { "query" : "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testquery.fa", "database": "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testdb.fa" } } ```
Alternatively you will be able to run your jobs through one of the available web interfaces.
Known problems with the DE: not all the teams are building apps the same way, this led to some functionalities not being available for Agave apps. In particular you may define an input field as accepting multiple files, but the GUI will not allow for multiple file selection. The same appears to be happen with AgaveToGo as well. In this case you will have to submit a JSON via command line or use http://cyverseuk.herokuapp.com/ (only for apps hosted on the EI system).
Known problem with AgaveToGo: it looks like some disabled apps keeps showing up in the list (they can't be used though).