Skip to content

XML 2.0 specification

Denis Rivière edited this page Aug 22, 2017 · 13 revisions

XML 2.0 specification

Table of content

Processes

The XML process specification makes it possible to use a standard Python function and to associate it with an XML string that enables the creation of a Process instance. This XML string will define the type and behaviour of function parameters and return value(s).

In order to create Process instance for a function it is necessary to get some information about each parameter of the function and about the return value. This information about parameters is defined in XML string with the exception of the default values of the parameters that are extracted from the function definition.

The process XML string contains one single <process> element. This element that may contains some global properties for the process. <process> may contain the following attributes :

  • capsul_xml (optional): version of the Capsul XML specification this process definition is compatible with. If omitted, the process definition is supposed to be compatible with the latest Capsul XML specification available.
  • role (optional): A role that is attached to the process. See "Process roles" below.

In the <process> element, one can find one <input> element per parameter of the function. If the process produces one or several outputs, it must use a <return> element. If <return> is not defined, the value returned by the Python function is ignored and cannot be used in pipelines. For a single output, the Python function must directly return the value and the value name (an output value must always have a name), type and documentations must be in the element's attributes (see below). Here is an example of a process defined as a function returning a value:

from capsul.process.xml import xml_process

@xml_process('''
<process capsul_xml="2.0">
    <input name="a" type="int" doc="An integer"/>
    <input name="b" type="int" doc="Another integer"/>
    <return name="addition" type="int" doc="a + b"/>
</process>
''')
def add(a, b):
     return a + b

If the process need to return several values, they must me declared with <output> elements located between <return> and </return>. The function must return the output values either in a list or in a dictionary. If it is a list the order of the <output> elements is used to match the values in the list and the process parameter names. If it is a dictionary, each key must correspond to a name attribute in an <output> element. For instance:

from capsul.process.xml import xml_process

@xml_process('''
<process capsul_xml="2.0">
    <input name="a" type="int" doc="An integer"/>
    <input name="b" type="int" doc="Another integer"/>
    <return>
        <output name="quotient" type="int" doc="Quotient of a / b"/>
        <output name="remainder" type="int" doc="Remainder of a / b"/>
    </return>
</process>
''')
def divide(a, b):
     return {
        'quotient': int(a / b),
        'remainder': a % b,
    }
    # On a process point of view, it would be equivalent to
    # use the following code:
    # return [int(a / b), a % b]

<input>, <output>, or <return> (for a single return with no children elements), contain the following attributes:

  • name: the name of the function parameter

  • type: the type of the parameter. See possible parameter types below.

  • allowed_extensions: for file type, list of possible file extensions.

  • doc (optional): the documentation of the parameter

  • <input> is straightforward: it is always an input parameter.

  • <output> is normally an output parameter, except in some cases when it is a file: an output file may have its filename specified as input (the filename is not generated by the process). In this case an additional attribute input_filename specifies the parameter used to specify the filename. this parameter has the type File and is marked as output, but is actually an input to the processing function.

  • <return> is an output which is returned by the processing function. For a single <return> it is very similar to <output> but only one <return> element is allowed in a process. The process should return a single value.

Parameter types

For <input>, <output> and <return> elements, the @type@ attribute can have the following values:

  • int
  • float
  • string
  • unicode
  • file
  • directory
  • enum : when this type is used, there must be a values attribute that contains a Python literal representing a list of possible values for the parameter.
  • list_int
  • list_float
  • list_string
  • list_unicode
  • list_file
  • list_directory

When a parameter accepts multiple types, they must be separated by a |. For instance a parameter accepting either a file or a list of file would use type="file|list_file".

Process roles

The role of a process gives information about the expected execution context. It can be used to decide whether a process should be executed in a given context or not. The role can also be used to propose a specific GUI for the process. For instance the role "viewer" indicate that the execution of the process will display something to the user. There is no need to execute such a process in a remote computer that is disconnected from the user environment.

The possible process roles are :

  • viewer: the process is used to display something to the user. It cannot be executed outside the user graphical environment. A viewer is not supposed to be blocking. It should terminate immediately an let the view live independently of the rest of the process. If blocking is required, use the dialog role.
  • dialog: a dialog is used to show something to the user and wait for a user action before ending its execution. Like a viewer, it cannot be executed outside the user graphical environment. The expected user action can be as simple as clicking on a single "ok" button ; in that case, the process should have no output. But it can be a complete form whose result must be returned via the process output parameter(s).

Association between a Python function and an XML string

There are two ways to perform the association between the function and the XML. The recommended method is to use a decorator to explicitly define the XML string associated to the function. Here is an example :

from capsul.process.xml import xml_process

@xml_process('''
<process capsul_xml="2.0">
    <input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
    <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Mehod for thresolding."/>
    <input name="threshold" type="float" desc="Threshold value."/>
    <output name="output_image" input_filename="output_location" type="file"
 desc="If set, define the output file name. Otherwise, the name is generated using a "threshold_" prefix on the input file name."/>
</process>
''')
def threshold(input_image, method='gt', threshold=0, output_location=None):
     pass

It is also possible to put the XML in the docstring of the function. However, this method is not recommend and should be avoided if possible. Example :

def threshold(input_image, method='gt', threshold=0, output_location=None):
    '''
    <process capsul_xml="2.0">
        <input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
        <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Mehod for thresolding."/>
        <input name="threshold" type="float" desc="Threshold value."/>
        <output name="output_image" input_filename="output_location" type="file" 
          desc="If set, define the output file name. Otherwise, the name is generated using a 'threshold_' prefix on the input file name."/>
    </process>
    '''
     pass

Processes examples

from capsul.process.xml import xml_process

@xml_process('''
<process capsul_xml="2.0">
    <input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
    <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']"
     doc="Mehod for thresolding."/>
    <input name="threshold" type="float" doc="Threshold value."/>
    <output name="output_image" input_filename="output_image" type="file" doc="Output file name."/>
</process>
''')
def threshold(input_image, output_image, method='gt', threshold=0):
     pass

@xml_process('''
<process capsul_xml="2.0">
    <input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
    <input name="mask" type="file" doc="Path of mask binary image."/>
    <output name="output_image" input_filename="output_location" type="file" doc="Output file name."/>
</process>
''')
def mask(input_image, mask, output_location=None):
     pass

Pipelines

An XML pipeline is an XML document containing a single <pipeline> element that may contains some global properties for the pipeline. Since a pipeline is also a process, the <pipeline> element may contain the same attributes as the <process> element (see above).

An XML pipeline contains a series of processes that are defined by <process> elements. The input and outputs of processes are connected by links that are defined in <link> elements. A pipeline may allow a user to select one group of processes among a series of process groups. The processes that are not selected are disabled (they will not be executed) whereas the selected processes are enabled. The <processes_selection> element is used to define a set of selectable process groups.

The <doc> element

This element has no attributes and contains the documentation of the process in a Sphinx compatible format.

The <process> element

A <process> element adds a new process instance to the pipeline. This instance is given a name that can be used in other XML elements to reference it. The process instance is referencing a module which is the function that is called when the instance is run. The <process> element can have the following attributes:

  • name: a string that can be used to reference the process instance. This must be a valid Python variable name. It should use the variable naming convention of Python's PEP 8.
  • module: a valid Capsul process identifier. This is typically a fully qualified (e.g. containing the absolute Python module dotted path) Python object name. But any string value accepted by capsul.loadre.get_process_instance() can be used.
  • role (optional): set the role of the process instance (se "Process roles" above). If a role has been defined on the process module, it is ignored and replaced be the one declared in teh pipeline. It is possible to use an empty string to force the process instance in the pipeline to have no role.
  • iteration (optional): when this attribute is used, the process instance will be an iteration process. The iteration attributes contains a coma separated lists of parameter names (for instance "input1,input2,output1"). This list indicate the process parameter names on which the iteration will be performed. For each of these parameters, the actual type of the process instance parameter will be replaced by a list whose elements must have the process parameter type.
  • enabled (optional): used to explicitly mark a node as disabled (value: "false")

The <process> element can contain the following elements:

<set>

The <set> element is used to set a fixed value to a parameter. It contains only two attributes:

  • name: the name of the parameter
  • value: The value of the parameter expressed as a Python literal. The use of a Python literal format enables the representation of structures values such as list. Some examples of values:
    • integer: <set name="x" value ="42"/>
    • float: <set name="x" value ="4.2"/>
    • string: <set name="x" value ="'a value'"/>
    • None (i.e. JSON null): <set name="x" value ="None"/>
    • list: <set name="x" value ="['one', 'two', 'three']"/>

When a value is set on a parameter, it becomes an optional parameter.

<nipype>

Capsul can use Nipype interfaces as process module. These interfaces uses traits types that have some parameters that need to be set in some contexts. The Nipype specific <nipype> element contains a name attribute to identify a process parameter. For more information about these parameters, see Nipype interface specification The following attributes can be used to customize Nipype traits :

  • usedefault: can be set to "true" or "false". Omitting the attribute is equivalent to "False".
  • copyfile: can be set to "true" or "false". Omitting the attribute is equivalent to "False". If the special value "discard" is used, the Nipype interface copyfile parameter will be set to True but the copied file will be deleted when the process terminates. This makes it possible to avoid some software (such as SPM) to modify input image but to keep only the original image at the end of the execution (the modified copy is deleted).

The <switch> element

Represents switch nodes. May be replaced by process selection if it proves to fulfill all the needs, but for now "old-style" switches still exist, and are the only ones which can be saved.

Attributes:

  • name: node name in the pipeline (as in process elements)
  • switch_value (optional): value of the "switch" parameter: name of the active input
  • enabed (optional): as in process elements

Children:

<input>

Input name for the switch. Input plugs will be a combination of input/output names <input>_switch_<output>

Attributes:

  • name
  • optional (optional) "true" or "false"

<output>

Output plug for the switch.

Attributes:

  • name
  • optional (optional)

The <optional_output_switch> element

Represents a specific switch node which allows to have optional output files in the pipeline parameters, while keeping them available for temporary values inside the pipeline if they are left undefined.

Attributes:

  • name: node name in the pipeline (as in process elements)
  • enabed (optional): as in process elements

Children:

<input>

Input name for the switch. Input plugs will be a combination of input/output names <input>_switch_<output>. In an optional output switch, only one input is allowed.

Attributes:

  • name
  • optional (optional) "true" or "false"

<output>

Output plug for the switch. Only one output is allowed.

Attributes:

  • name

The <link> element

This element adds a ling between an input parameter of a process and an output parameter of another pipeline. It can also be used to "export" a process parameter. Exporting a process parameter means making it visible in the parameters of the pipeline. Unlike, the default Pipeline behaviour in Capsul's API, a pipeline defined in Capsul XML 2.0 dot not export automatically the unconnected parameters of its processes. The <link> element contains no child elements and mus have exaclty two attributes:

  • source: the parameter where the link starts from.
  • dest: the parameter where the link ends to.
  • weak_link (optional): "true" or "false"

The value of these attributes can be either a single identifier (e.g. "parameter_name") or two identifiers separated by a dot (e.g. "process_name.parameter_name"). A single identifier correspond to a pipeline parameter whereas two identifiers identify a process parameter, they must correspond to the name of a process and the name of one parameter of this process.

The <processes_selection> element

The <processes_selection> element defines a series of processes groups. Each processes group is composed by a series of processes added in the pipeline with the <process> element. Only one of these processes groups can be executed in the pipeline. Therefore, a new parameter is added to the pipeline that allows the user to select the group to execute. All processes in the selected group are activated (i.e. will be executed) whereas all processes in other groups are disabled (i.e. will not be executed).

The <processes_selection> has a single name attribute that is the name of the parameter that is added to the pipeline. It must contains two or more <processes_group> elements. Each <processes_group> contains one or more <process> element having only a single name attribute. This attribute is the name of a process defined in the pipeline (see The <process> element above).

The <pipeline_steps> element

Children:

<step>

Attributes:

  • name: name for the step
  • enabled (optional): "true" or "false"

Children:

<node>

Attributes:

  • name: name of an existing pipeline node which will be part of this step.

The <gui> element

The <gui> element enables to define the position of nodes for a graphical representation. The position of a node is given by a <position> element that contains three attributes :

  • name: The name of the process (as given in the process element).
  • x: The x coordinate of the process.
  • y: The y coordinate of the process.

A single global zoom level can be given to the gui with a <zoom> element that contains a single level attributes whose value is a floating point.

Pipeline example

<pipeline capsul_xml="2.0">
    <process name="threshold_gt_1" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="1"/>
        <set name="method" value="'gt'"/>
    </process>
    <process name="threshold_gt_10" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="10"/>
        <set name="method" value="'gt'"/>
    </process>
    <process name="threshold_gt_100" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="100"/>
        <set name="method" value="'gt'"/>
    </process>
    <process name="threshold_lt_1" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="1"/>
        <set name="method" value="'lt'"/>
    </process>
    <process name="threshold_lt_10" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="10"/>
        <set name="method" value="'lt'"/>
    </process>
    <process name="threshold_lt_100" 
     module="capsul.process.test.test_load_from_description.threshold">
        <set name="threshold" value="100"/>
        <set name="method" value="'lt'"/>
    </process>
    <process name="mask_1" 
     module="capsul.process.test.test_load_from_description.mask">
    </process>
    <process name="mask_10" 
     module="capsul.process.test.test_load_from_description.mask">
    </process>
    <process name="mask_100" 
     module="capsul.process.test.test_load_from_description.mask">
    </process>

    <link source="input_image" dest="threshold_gt_1.input_image"/>
    <link source="input_image" dest="threshold_gt_10.input_image"/>
    <link source="input_image" dest="threshold_gt_100.input_image"/>
    
    <link source="input_image" dest="threshold_lt_1.input_image"/>
    <link source="input_image" dest="threshold_lt_10.input_image"/>
    <link source="input_image" dest="threshold_lt_100.input_image"/>

    <link source="input_image" dest="mask_1.input_image"/>
    <link source="input_image" dest="mask_10.input_image"/>
    <link source="input_image" dest="mask_100.input_image"/>

    <link source="threshold_gt_1.output_image" dest="mask_1.mask"/>
    <link source="threshold_gt_10.output_image" dest="mask_10.mask"/>
    <link source="threshold_gt_100.output_image" dest="mask_100.mask"/>
    <link source="threshold_lt_1.output_image" dest="mask_1.mask"/>
    <link source="threshold_lt_10.output_image" dest="mask_10.mask"/>
    <link source="threshold_lt_100.output_image" dest="mask_100.mask"/>

    <link source="mask_1.output_image" dest="output_1"/>
    <link source="mask_10.output_image" dest="output_10"/>
    <link source="mask_100.output_image" dest="output_100"/>

    <processes_selection name="select_method">
        <processes_group name="greater than">
            <process name="threshold_gt_1"/>
            <process name="threshold_gt_10"/>
            <process name="threshold_gt_100"/>
        </processes_group>
        <processes_group name="lower than">
            <process name="threshold_lt_1"/>
            <process name="threshold_lt_10"/>
            <process name="threshold_lt_100"/>
        </processes_group>
    </processes_selection>
    
    <gui>
        <position name="threshold_gt_100" x="386.0" y="403.0"/>
        <position name="inputs" x="50.0" y="50.0"/>
        <position name="mask_1" x="815.0" y="153.0"/>
        <position name="threshold_gt_10" x="374.0" y="242.0"/>
        <position name="threshold_lt_100" x="556.0" y="314.0"/>
        <position name="threshold_gt_1" x="371.0" y="88.0"/>
        <position name="mask_10" x="820.0" y="293.0"/>
        <position name="mask_100" x="826.0" y="451.0"/>
        <position name="threshold_lt_1" x="570.0" y="6.0"/>
        <position name="threshold_lt_10" x="568.0" y="145.0"/>
        <zoom level="1.0"/>
    </gui>
</pipeline>

API

Definition of processes and pipelines in Capsul XML 2.0 are recognised by get_process_instance. For an XML process, the identifier of the process is <module>.<function> where <module> is the fully qualified name of the Python module where the function is located and <function> is the name of the function as defined in the module. In order to work with get_process_instance, the module must be in the Python path. For instance, capsul.process.test.test_load_from_description.threshold is the identifier of the function threshold located in the module capsul.process.test.test_load_from_description.

For an XML pipeline, get_process_instance is looking for the XML file defining the pipeline. The file name must ends with .xml and be located in a directory associated to a valid Python package (i.e. a module in a directory). The pipeline identifier is a string <module>.<name> where <module> is the fully qualified Python module name and <name> is the file name without the .xml extension. For instance capsul.process.test.test_pipeline is the identifier for the pipeline defined in <python_path>/capsul/process/test/test_pipeline.xml.

One can find all the Processe and Pipeline identifiers defined in a module (and recursively in all its sub-modules) with the function find_processes(module_name) (in capsul.process.finder). For instance, to try to instanciate all processes and pipelines defined in the module clinfmri :

from capsul.api import get_process_instance, find_processes

for p in find_processes('clinfmri'):
    try:
        get_process_instance(p)
    except Exception:
        print 'FAILED', p
    else:
        print 'GOOD', p

XML validation

There is no validation of the XML document in get_process_instance. As a consequence, one will only get an error if the XML does not allow to build a process or pipeline class (for instance if a mandatory attribute is missing). On the other hand, misspelling of an element or attribute name may not raise an error (the unknown item is simply ignored). If there is a need for a validation feature for pipeline development, it will be added in separate functions that would be built to give precise errors and warnings to the user (including line number in the XML file).