Skip to content

Commit

Permalink
Merge pull request #129 Added report system
Browse files Browse the repository at this point in the history
  • Loading branch information
ODiogoSilva authored Sep 21, 2018
2 parents 9885389 + 281225b commit e4affbe
Show file tree
Hide file tree
Showing 117 changed files with 3,243 additions and 282 deletions.
42 changes: 42 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Changelog

## 1.3.0

### Features
- Added `report` run mode to Flowcraft that displays the report of any given
pipeline in the Flowcraft's web application. The `report` mode can be executed
after a pipeline ended or during the pipeline execution using the `--watch`
option.
- Added standalone report HTML at the end of the pipeline execution.
- Components with support for the new report system:
- `abricate`
- `assembly_mapping`
- `check_coverage`
- `chewbbaca`
- `dengue_typing`
- `fastqc`
- `fastqc_trimmomatic`
- `integrity_coverage`
- `mlst`
- `patho_typing`
- `pilon`
- `process_mapping`
- `process_newick`
- `process_skesa`
- `process_spades`
- `process_viral_assembly`
- `seq_typing`
- `trimmomatic`
- `true_coverage`

### Minor/Other changes

- Refactored report json for components `mash_dist`, `mash_screen` and
`mapping_patlas`

### Bug fixes
- Fixed issue where `seq_typing` and `patho_typing` processes were not feeding
report data to report compiler.
- Fixed fail messages for `process_assembly` and `process_viral_assembly`
components

## 1.2.2

### Components changes
Expand All @@ -9,6 +49,8 @@ sam and bam files and added data to .report.json. Updated databases to pATLAS
version 1.5.2.
- `mash_screen` and `mash_dist`: added data to .report.json. Updated databases
to pATLAS version 1.5.2.
- Added new options to `abricate` componente. Users can now provide custom database
directories, minimum coverage and minimum identity parameters.

### New components

Expand Down
4 changes: 4 additions & 0 deletions docs/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ div.wy-side-nav-search, div.wy-nav-top {

.wy-menu > .caption > .caption-text {
color: #5c6bc0;
}

.wy-nav-content {
max-width: 100%
}
2 changes: 2 additions & 0 deletions docs/dev/create_process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ must be used **only once**. Like in the input channel, this channel should
be defined with a two element tuple with the sample ID and the data. The
sample ID must match the one specified in the ``input_channel``.

.. _compiler:

{% include "compiler_channels.txt %}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
90 changes: 90 additions & 0 deletions docs/dev/pipeline_reporting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
Pipeline reporting
==================

This section describes how the reports of a FlowCraft pipeline are generated
and collected at the end of a run. These reports can then be sent to the
`FlowCraft web application <https://github.com/assemblerflow/flowcraft-webapp>`_
where the results are visualized.

.. important::
Note that if the nextflow process reports add new types of data, one or
more React components need to be added to the web application for them
to be rendered.

Data collection
---------------

The data for the pipeline reports is collected from three dotfiles in each nextflow
process (they should be present in each work sub directory):

- **.report.json**: Contains report data (See :ref:`report-json` for more information).
- **.versions**: Contains information about the versions of the software used
(See :ref:`versions` for more information).
- **.command.trace**: Contains resource usage information.

The **.command.trace** file is generated by nextflow when the **trace** scope
is active. The **.report.json** and **.version** files are specific to
FlowCraft pipelines.

Generation of dotfiles
^^^^^^^^^^^^^^^^^^^^^^

Both **report.json** and **.versions** empty dotfiles are automatically generated
by the ``{% include "post.txt" ignore missing %}`` placeholder, specified in the
:ref:`create-process` section. Using this placeholder in your processes is all
that is needed.

Collection of dotfiles
^^^^^^^^^^^^^^^^^^^^^^

The **.report.json**, **.versions** and **.command.trace** files are automatically
collected and sent to dedicated report channels in the pipeline by the
``{%- include "compiler_channels.txt" ignore missing -%}`` placeholder, specified
in the :ref:`process creation <compiler>` section. Placing this placeholder in your
processes will generate the following line in the output channel specification::

set {{ sample_id|default("sample_id") }}, val("{{ task_name }}_{{ pid }}"), val("{{ pid }}"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_{{task_name}}_{{ pid }}

This line collects several metadata associated with the process along with the three
dotfiles.

Compilation of dotfiles
^^^^^^^^^^^^^^^^^^^^^^^

As mentioned in the previous section, the dotfiles and other relevant metadata
for are sent through special report channels to a FlowCraft component that is
responsible for compiling all the information and generate a single report
file at the end of each pipeline run.

This component is specified in ``flowcraft.generator.templates.report_compiler.nf``
and it consists of two nextflow processes:

- First, the **report** process receives the data from each executed process that
sends report data and runs the ``flowcraft/bin/prepare_reports.py`` script
on that data. This script will simply merge metadata and dotfiles information
in a single JSON file. This file contains the following keys:

- ``reportJson``: The data in **.report.json** file.
- ``versions``: The data in **.versions** file.
- ``trace``: The data in **.command.trace** file.
- ``processId``: The process ID
- ``pipelineId``: The pipeline ID that defaults to one, unless specified in
the parameters.
- ``projectid``: The project ID that defaults to one, unless specified in
the parameters.
- ``userId``: The user ID that defaults to one, unless specified in
the parameters.
- ``username``: The user name that defaults to *user*, unless specified in
the parameters
- ``processName``: The name of the flowcraft component.
- ``workdir``: The work directory where the process was executed.

- Second, all JSON files created in the process above are merged
and a single reports JSON file is created. This file will contains the
following structure::

reportJSON = {
"data": {
"results": [<array of report JSONs>]
}
}
63 changes: 46 additions & 17 deletions docs/dev/process_dotfiles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,22 @@ execution of the process. When this occurs, the ``.status`` channel must have
the ``fail`` string as well. As in the warning dotfile, there is no
particular format for the fail message.

.. _report-json:

Report JSON
-----------

.. important::
The general specification of the report JSON changed in version 1.2.2.
See the `issue tracker <https://github.com/assemblerflow/flowcraft/issues/96>`_
for details.

The ``.report.json`` file stores any information from a given process that is
deemed worthy of being reported and displayed at the end of the pipeline.
Any information can be stored in this file, as long as it is in JSON format,
but there are a couple of recommendations that are necessary to follow
for them to be processed by a reporting web app (Currently hosted at
`report-nf <https://github.com/ODiogoSilva/report-nf>`_). However, if
`flowcraft-webapp <https://github.com/assemblerflow/flowcraft-webapp>`_). However, if
data processing will be performed with custom scripts, feel free to specify
your own format.

Expand All @@ -63,33 +70,53 @@ Information meant to be displayed in tables should be in the following
format::

json_dic = {
"tableRow": [
{"header": "Raw BP",
"value": chars,
"table": "assembly",
"columnBar": True},
"tableRow": [{
"sample": "A",
"data": [{
"header": "Raw BP",
"value": 123,
"table": "qc"
}, {
"header": "Coverage",
"value": 32,
"table": "qc"
}]
}, {
"sample": "B",
"data": [{
"header": "Coverage",
"value": 35,
"table": "qc"
}]
}]
}

This means that the ``chars`` variable that is created during the execution
of the process should appear as a table entry with the specified ``header``
and ``value``. The ``table`` key specifies in which table of the reports
it will appear and the ``columnBar`` key informs the report generator to
create a bar column in that particular cell.
This provides table information for multiple samples in the same process. In
this case, data for two samples is provided. For each sample, values for
one or more headers can be provided. For instance, this report provides
information about the **Raw BP** and **Coverage** for sample **A** and this
information should go to the **qc** table. If any other information is relevant
to build the table, feel free to add more elements to the JSON.

Information for plots
^^^^^^^^^^^^^^^^^^^^^

Information meant to be displayed in plots should be in the following format::

json_dic = {
"plotData": {
"size_dist": size_dist
}
"plotData": [{
"sample": "strainA",
"data": {
"sparkline": 23123,
"otherplot": [1,2,3]
}
}],
}

This is a simple key:value pair, where the key is the ID of the plot in the
reports and the ``size_dist`` contains the plot data that was gathered
for a particular process.
As in the table JSON, *plotData* should be an array with an entry for each
sample. The data for each sample should be another JSON where the keys are
the *plot signatures*, so that we know to which plot the data belongs. The
corresponding values are whatever data object you need.

Other information
^^^^^^^^^^^^^^^^^
Expand All @@ -99,6 +126,8 @@ is not particular format for other information. They will simply store the
data of interest to report and it will be the job of a downstream report app
to process that data into an actual visual report.

.. _versions:

Versions
--------

Expand Down
Loading

0 comments on commit e4affbe

Please sign in to comment.