Skip to content
Alice Minotto edited this page Jan 26, 2017 · 14 revisions

##Register and run apps with Agave API

  • Publishing an App
  • Running an App from command line

  • Registering an App

    Preliminary steps

    To perform the following steps you need to sign up for a CyVerse account here.

    1 - Setting up Agave API Access

    See the FAQ.

    2 - System registration (skip if using EI hardware)

    Agave is a RESTful API, meaning that we can interact with it using POST and GET http requests. The cyverse-cli tools are essentially wrappers around these types of requests to make these requests shorter.

    Agave tracks two kinds of resources: Systems and Apps. There are 2 types of Systems: Storage and Execution. Apps run on Execution Systems using data from Storage Systems to produce desired results. So to run an app, we need an Execution system first. Systems are described using JSON files, which are then posted to the API. An Execution System JSON consists of 4 parts, which will be described below.

    Execution System JSON - System Basics

    The first part consists of system basics: id, type etc. See an example below:

    Expand source

    ``` "id" : "myTutorialMachine", "name" : "A machine for the EI Agave tutorial", "type" : "EXECUTION", "executionType" : "CLI", "scheduler" : "FORK", ```

    The variables mostly speak for themselves. The `executionType` variable can be either CLI, CONDOR or HPC depending on the type of scheduler running on the system. In this case, we assume there is no scheduler running and so we choose `CLI`, with `FORK` as a scheduler. See the Agave docs for more details on the Scheduler variables.
    Execution System JSON - Storage

    All Execution systems need to define storage as scratch space. For this example, we'll assume you have a scratch directory mounted somewhere on /mnt/ (an SSHFS for example)

    Expand source

    ``` "storage": { "host" : "yourhost.example.org", "port" : 22, "protocol" : "SFTP", "homedir" : "/mnt/scratch/username", "rootdir" : "/mnt/scratch", "auth" : { "type": "PASSWORD", "username" : "username", "password" : "password" }, } ```

    If you are uncomfortable with putting your password in plaintext, see below for specifying an `auth` object with SSHKEYS.
    Execution System JSON - Queues

    All execution systems need a default Queue to which jobs are submitted. In our example, we are using a simple CLI system, so there are no scheduler queues that we need to deal with. This means we can get away with a simple specification like this:

    Expand source

    ``` "queues": [ { "name": "normal", "default": true, "maxRequestedTime": "24:00:00", "maxJobs": 10, "maxUserJobs": 5, "maxNodes": 1, "maxMemoryPerNode": "4GB", "maxProcessorsPerNode": 12, "customDirectives": null } ] ```

    You'll want to change the variables to suit your system.
    Execution System JSON - Login

    Lastly, Agave will need to know information to login to the Execution system. This can be specified using a Login object, which is specified as follows:

    Expand source

    ``` "login": { "host" : "yourhost.example.org", "port" : "22", "protocol": "SSH", "auth" : { "type" : "PASSWORD", "username" : "username", "password" : "changethis" } } ```

    Like mentioned before, posting your password in plaintext is usually a bad idea. We can specify a login object using public and private keys as well. To do this we'll change the `auth` part of the object as follows:
    Expand source

    ``` "auth" { "type" : "SSHKEYS", "username" : "username", "publicKey" : "ssh-rsa AAAA...your public key... [email protected]", "privateKey" : "-----BEGIN RSA PRIVATE KEY-----*private key here*-----END RSA PRIVATE KEY-----" } ```

    An important thing to note when using keypairs is that your private key should be JSON encoded before pasting it into the JSON file using the jsonpki command: ```json-pki --private /path/to/private/id_rsa``` If necessary, a password for the file can be specified using `--password`.
    Registering the execution system

    Now that we have defined our system, you can find the completed JSON file below:

    Expand source

    ``` { "id" : "myTutorialMachine", "name" : "A machine for the TGAC Agave tutorial", "type" : "EXECUTION", "executionType" : "CLI", "scheduler" : "FORK",

    "storage": { "host" : "yourhost.example.org", "port" : 22, "protocol" : "SFTP", "homedir" : "/mnt/scratch/username", "rootdir" : "/mnt/scratch", "auth" : { "type": "PASSWORD", "username": "username", "password": "changethis" }, },

    "queues": [ { "name": "normal", "default": true, "maxRequestedTime": "24:00:00", "maxJobs": 10, "maxUserJobs": 5, "maxNodes": 1, "maxMemoryPerNode": "4GB", "maxProcessorsPerNode": 12, "customDirectives": null } ],

    "login": { "host" : "yourhost.example.org", "port" : "22", "protocol": "SSH", "auth" : { "type" : "PASSWORD", "username" : "username", "password" : "changethis" } } }

    </p></details>
    Let's use it to register the system on Agave:  
    ```systems-addupdate -v -F TutSystem.json```  
    A large amount of JSON describing our new system will be returned to confirm the registration. Now that we have an execution system, let's move on to registering our workflow as an App in the next part.  
    
    #### App registration
    
    An App in the Agave API means a workflow that is wrapped into a single unit which can be executed by a user. It is described in the same way a system is described: JSON. In this part we'll register a test app that runs a simple BLAST job.
    
    ##### App JSON - Front matter
    
    The first thing we'll need to describe are some basic parameters of our app:  
    <details>
     <summary>Expand source</summary>
     <p>
    

    "name" : "blastapp-tutorial", "label" : "EI tutorial BLAST app", "version" : "0.0.1", "executionType" : "CLI",

    </p></details>
    The App ID will be generated from the name and version number and this combination *must be unique*. You can delete the previous one (if there was an error), or increase the version number (if you need to make an updated version).  
    Next, we'll specify where and how the app will run:  
    <details>
     <summary>Expand source</summary>
     <p>
    

    "executionSystem" : "myhost.example.org", "deploymentPath" : "username/apps/tgac_tutorial", "templatePath" : "wrapper.sh", "testPath" : "test.sh", "parallelism" : "SERIAL",

    </p></details>  
    When specifying an `executionSystem` only like above, *you must make sure your app assets are already present on the system!*. This means that you need admin access to your execution system. Often this is not the case. To remedy this, we can store our apps assets on the CyVerse Datastore and specify a "deploymentSystem" parameter like so:  
    ```"deploymentSystem" : "data.iplantcollaborative.org",```  
    Finally, we'll specify our apps inputs:  
    <details>
      <summary>Expand source</summary>
      <p>
    

    "inputs" : [ { "id": "query", "details" : { "label": "Query" , "description": "FASTA file with query sequence(s)" }, "value": { "required" : "true" } }, { "id": "database", "details" : { "label": "Database" , "description": "FASTA file with sequences to search (database)" }, "value": {"required" : "true"} } ], "parameters" : [ ]

    </p></details>  
    We're leaving parameters empty, but we could add any BLAST command line parameters here. Now that we have specified this, we'll have to actually upload our app's assets to CyVerse.
    
    ##### Storing App assets with CyVerse
    
    [on hold]
    
    ##### Creating the App assets
    
    For our minimal BLAST app, we'll need three files: a wrapper script, a test script and an executable. Because of the way BLAST works, we'll actually need two executables for this app. First we'll create the wrapper script:  
    <details>
     <summary>Expand source</summary>
     <p>
    

    #!/bin/bash

    QUERY="${query}" DATABASE="${database}"

    These two lines are necessary because permissions get lost in the Agave transfer

    chmod u+x lib/makeblastdb chmod u+x lib/blastn

    lib/makeblastdb -dbtype nucl -in $DATABASE -out db lib/blastn -query $QUERY -db db

    return $!;

    </p></details>  
    As you can see from the first line, this is a plain bash script that runs our pipeline. The next two lines set up our main two parameters: the query and the database. The `${query}` directive will be replaced BEFORE execution of the script by Agave to the inputs we have given. Note that the word query is the id we specified in our JSON file earlier. The next line does the same for the database file.  
    The next two lines run our actual BLAST 'pipeline': first we create our database with makeblastdb and we then execute the BLAST with blastn. The `lib/` part of the command line is because of the way we will set up our app assets; Agave convention requires that all our App's executables are stored in a separate lib directory. We output the database in the first line with a simple title of db, and we call that database again in the next line.  
    The last line returns the current exit status, which will be inherited from the status of BLAST; this means that the script will pass on BLAST's exit value as its own.  
    Next, we'll need a test script that test our app with some default data. This is useful, but we'll skip this for now as it is a bit out of this tutorials' scope. Instead, we'll just write a script that returns true and call it done:  
    

    #!/bin/bash return true

    Finally we'll need to provide the BLAST executables. These can be obtained from the NCBI ftp server.
    
    [ on hold ]
    
    The wrapper script should perform all the checks that the Agave API doesn't support (mutually inclusive or exclusive parameters for example), and ideally return the proper error before running the Docker container. It may be useful to use the wrapper script to delete any new files that is not needed from the working directory, to avoid them to be archived.  
    In our case there is some additional logic in the wrapper scripts to allow some automatatic tasks in the virtual machines to perform as expected and to integrate the system with the webapp (<a href="http://cyverseuk.herokuapp.com/">CyVerseUk</a>). You don't usually have to worry about this.  
    
    ##### Registering App in Agave
    
    Now that our assets are in place, we can register our app in Agave using the JSON file we wrote earlier. (If needed, refresh your access tokens with `auth-tokens-refresh`). Navigate to where the file is stored (we'll assume you've named it TutApp.json) and run the `apps-addupdate` command:  
    ```apps-addupdate -v -F TutApp.json```  
    A bunch of JSON describing your app will be returned, confirming the registration of our app.
    
    *****************
    
    #####<div id="json">Additional notes on the JSON file</div>
    
    Following the introductory part the JSON file lists inputs and parameters. A good documentation about the available fields and their usage can be found <a href=http://agaveapi.co/documentation/tutorials/app-management-tutorial/app-inputs-and-parameters-tutorial/>here</a>.  
    For the application (if you wish to publish it) to display a proper information window in the Discovery Environment, the following fields need to be present in the JSON file: `help_URI`, `datePublished`, `author`, `longDescription`.  
    In the `ontology` field a list of IRI for topic and operation branches of the <a href=http://www.ebi.ac.uk/ols/ontologies/edam>EDAM ontology</a> has to be specified to properly categorize the App.  
    
    May you encounter some problems registering your application, I'd suggest first checking the JSON file is valid. A good way to do this is to copy-paste it to <a href="https://togo.agaveapi.co/app/#/apps/new">AgaveToGo</a>.  
    
    If `details.showArgument` (boolean) is set to `true`, it will pass `details.argument` before the value (e.g. if we want to pass to command line `--kmer 31`). Note that the argument is put before the value without spaces (so usually we want to add one in the string!!).  
    `value.validator` can supply a check on the format of the submitted value as a <a href=http://perldoc.perl.org/perlre.html>perl formatted regular expression</a>. (**pay particular attention to the escapes**)  
    Example case: JSON `value.type` doesn't provide a distinction between integers and floating point, but just `number`. To check the input is an integer we may use `"validator": "^\\d*$"` (or `"^[0-9]+$"` to avoid the escapes). The same field also allow to accept just even/odd numbers, set a maximum value, etc.  
    *Also note that it may be useful to define numerical variables as strings providing the right validator if we don't want to define a default value, because both the Discovery Environment and the CyVerseUk web interface will pass 0 otherwise.*  
    We usually don't want the user to work in a folder that is not the set working directory, so if the program run by the App has a `--output_directory` option (or similar) we may want to add a validator to be sure that the string doesn't start with '/', or just hide it and give a default name (e.g. `output`, this will also make the wrapper script easier to write and maintain).
    
    **IMPORTANT**:  
    ```json
    "value": {
        ...
        "visible": false,
        "default": "default_value",
        ...
    }
    

    is NOT supported. The default value must be provided in the wrapper script if we don't want the user to be able to change it.


    ####Docker integration

    It's not possible to run an App in CyVerse interactively. Therefore to run multiple commands in a Docker container we need the following syntax in the wrapper.sh script:

    docker run <image_name[:tag]> /bin/bash -c "command1;command2...;".
    

    /bin/bash is not strictly necessary, but, depending on the base image, bash may not be the default shell: adding it to command line takes care of this problem.

    IMPORTANT UPDATE: in Docker version 1.12 the SHELL instruction was added. This allows the default shell used for the shell form of commands to be overridden (at build time too-so it may make the built a bit slower). Use it as follows:
    SHELL ["/bin/bash", "-c"]


    ####

    Condor integration

    The HPC on CyVerseUk infrastructure is using HTCondor scheduler, so the wrapper.sh is not enough to run the app, but a HTCondorSubmit.htc script is needed as well.
    The HTCondorSubmit.htc file will be in the following form:

    Expand source

    ``` universe = docker docker_image = [:tag] executable = should_transfer_files = YES arguments = transfer_input_files = transfer_output_files = when_to_transfer_output = ON_EXIT request_memory = 100G ```

    This HTCondor submit has to be generated by the `wrapper.sh` since we can't know in advance arguments and inputs files. `transfer_output_files` is not needed if the output is in the working directory. A good idea is to create, when possible, all the output files in a subdirectory (e.g. `output`) of the working directory, so that the transfer is easier. If transferring executables in `transfer_input_files`, make sure to restore the right permissions in the wrapper script (e.g. `chmod u+x `). It's also possible that the Docker image has to be updated giving 777 permissions to scripts because of how Condor handle Docker.

    ###

    Publishing an App

    The App, after being made public with (this step has to be performed by a tenant admin, so please contact them if you have a ready-to-publish application):

    apps-pems-update -v -u <username> -p ALL <app_name>-<version>
    

    can be found both in the DE, under Apps>High-Performance Computing, and in the CyVerseUk web interface. The App interface is automatically generated based on the submitted JSON file.


    ###

    Running an App from command line

    Finally, we can run our App! We'll need one more (short) JSON file to run a new job:

    Expand source

    ``` { "name" : "blasttest", "appId" : "blastapp-tutorial-0.0.1", "archive" : "true", "inputs": { "query" : "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testquery.fa", "database": "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testdb.fa" } } ```

    We'll save this file as RunApp.json and submit it as a job with the `jobs-submit` command: ```jobs-submit -v -W -F RunApp.json``` The -W flag in this command tells it to keep watching the job in the current window, with can be stopped with `Ctrl-C`. After your job has completed, your outputs, logs and error messages will be in a folder that is generated automatically on your apps storage system (which is the CyVerse data store in our case, but you can modify this at run time in a JSON field). To view them on the CyVerse data store, check the "archive" folder under your username. All your job output will be in a separate subfolder under the "jobs" folder. Alternatively you will be able to run your jobs through one of the available web interfaces.