Merge pull request #129 Added report system

assemblerflow · Sep 21, 2018 · e4affbe · e4affbe
2 parents 9885389 + 281225b
commit e4affbe
Show file tree

Hide file tree

Showing 117 changed files with 3,243 additions and 282 deletions.
diff --git a/changelog.md b/changelog.md
@@ -1,5 +1,45 @@
 # Changelog
 
+## 1.3.0
+
+### Features
+- Added `report` run mode to Flowcraft that displays the report of any given
+pipeline in the Flowcraft's web application. The `report` mode can be executed
+after a pipeline ended or during the pipeline execution using the `--watch`
+option.
+- Added standalone report HTML at the end of the pipeline execution.
+- Components with support for the new report system:
+    - `abricate`
+    - `assembly_mapping`
+    - `check_coverage`
+    - `chewbbaca`
+    - `dengue_typing`
+    - `fastqc`
+    - `fastqc_trimmomatic`
+    - `integrity_coverage`
+    - `mlst`
+    - `patho_typing`
+    - `pilon`
+    - `process_mapping`
+    - `process_newick`
+    - `process_skesa`
+    - `process_spades`
+    - `process_viral_assembly`
+    - `seq_typing`
+    - `trimmomatic`
+    - `true_coverage`
+
+### Minor/Other changes
+
+- Refactored report json for components `mash_dist`, `mash_screen` and 
+`mapping_patlas`
+
+### Bug fixes
+- Fixed issue where `seq_typing` and `patho_typing` processes were not feeding
+report data to report compiler.
+- Fixed fail messages for `process_assembly` and `process_viral_assembly` 
+components
+
 ## 1.2.2
 
 ### Components changes
@@ -9,6 +49,8 @@ sam and bam files and added data to .report.json. Updated databases to pATLAS
 version 1.5.2.
 - `mash_screen` and `mash_dist`: added data to .report.json. Updated databases 
 to pATLAS version 1.5.2.
+- Added new options to `abricate` componente. Users can now provide custom database
+directories, minimum coverage and minimum identity parameters.
 
 ### New components
 

diff --git a/docs/_static/custom.css b/docs/_static/custom.css
@@ -4,4 +4,8 @@ div.wy-side-nav-search, div.wy-nav-top {
 
 .wy-menu > .caption > .caption-text {
   color: #5c6bc0;
+}
+
+.wy-nav-content {
+  max-width: 100%
 }
diff --git a/docs/dev/create_process.rst b/docs/dev/create_process.rst
@@ -116,6 +116,8 @@ must be used **only once**. Like in the input channel, this channel should
 be defined with a two element tuple with the sample ID and the data. The
 sample ID must match the one specified in the ``input_channel``.
 
+.. _compiler:
+
 {% include "compiler_channels.txt %}
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 

diff --git a/docs/dev/pipeline_reporting.rst b/docs/dev/pipeline_reporting.rst
@@ -0,0 +1,90 @@
+Pipeline reporting
+==================
+
+This section describes how the reports of a FlowCraft pipeline are generated
+and collected at the end of a run. These reports can then be sent to the
+`FlowCraft web application <https://github.com/assemblerflow/flowcraft-webapp>`_
+where the results are visualized.
+
+.. important::
+    Note that if the nextflow process reports add new types of data, one or
+    more React components need to be added to the web application for them
+    to be rendered.
+
+Data collection
+---------------
+
+The data for the pipeline reports is collected from three dotfiles in each nextflow
+process (they should be present in each work sub directory):
+
+- **.report.json**: Contains report data (See :ref:`report-json` for more information).
+- **.versions**: Contains information about the versions of the software used
+  (See :ref:`versions` for more information).
+- **.command.trace**: Contains resource usage information.
+
+The **.command.trace** file is generated by nextflow when the **trace** scope
+is active. The **.report.json** and **.version** files are specific to
+FlowCraft pipelines. 
+
+Generation of dotfiles
+^^^^^^^^^^^^^^^^^^^^^^
+
+Both **report.json** and **.versions** empty dotfiles are automatically generated
+by the ``{% include "post.txt" ignore missing %}`` placeholder, specified in the
+:ref:`create-process` section. Using this placeholder in your processes is all
+that is needed.
+
+Collection of dotfiles
+^^^^^^^^^^^^^^^^^^^^^^
+
+The **.report.json**, **.versions** and **.command.trace** files are automatically
+collected and sent to dedicated report channels in the pipeline by the
+``{%- include "compiler_channels.txt" ignore missing -%}`` placeholder, specified
+in the :ref:`process creation <compiler>` section. Placing this placeholder in your
+processes will generate the following line in the output channel specification::
+
+    set {{ sample_id|default("sample_id") }}, val("{{ task_name }}_{{ pid }}"), val("{{ pid }}"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_{{task_name}}_{{ pid }}
+
+This line collects several metadata associated with the process along with the three
+dotfiles.
+
+Compilation of dotfiles
+^^^^^^^^^^^^^^^^^^^^^^^
+
+As mentioned in the previous section, the dotfiles and other relevant metadata
+for are sent through special report channels to a FlowCraft component that is
+responsible for compiling all the information and generate a single report
+file at the end of each pipeline run.
+
+This component is specified in ``flowcraft.generator.templates.report_compiler.nf``
+and it consists of two nextflow processes:
+
+- First, the **report** process receives the data from each executed process that
+  sends report data and runs the ``flowcraft/bin/prepare_reports.py`` script
+  on that data. This script will simply merge metadata and dotfiles information
+  in a single JSON file. This file contains the following keys:
+
+    - ``reportJson``: The data in **.report.json** file.
+    - ``versions``: The data in **.versions** file.
+    - ``trace``: The data in **.command.trace** file.
+    - ``processId``: The process ID
+    - ``pipelineId``: The pipeline ID that defaults to one, unless specified in
+      the parameters.
+    - ``projectid``: The project ID that defaults to one, unless specified in
+      the parameters.
+    - ``userId``: The user ID that defaults to one, unless specified in
+      the parameters.
+    - ``username``: The user name that defaults to *user*, unless specified in
+      the parameters
+    - ``processName``: The name of the flowcraft component.
+    - ``workdir``: The work directory where the process was executed.
+
+- Second, all JSON files created in the process above are merged
+  and a single reports JSON file is created. This file will contains the
+  following structure::
+
+    reportJSON = {
+        "data": {
+            "results": [<array of report JSONs>]
+        }
+    }
diff --git a/docs/dev/process_dotfiles.rst b/docs/dev/process_dotfiles.rst
@@ -44,15 +44,22 @@ execution of the process. When this occurs, the ``.status`` channel must have
 the ``fail`` string as well. As in the warning dotfile, there is no
 particular format for the fail message.
 
+.. _report-json:
+
 Report JSON
 -----------
 
+.. important::
+    The general specification of the report JSON changed in version 1.2.2.
+    See the `issue tracker <https://github.com/assemblerflow/flowcraft/issues/96>`_
+    for details.
+
 The ``.report.json`` file stores any information from a given process that is
 deemed worthy of being reported and displayed at the end of the pipeline.
 Any information can be stored in this file, as long as it is in JSON format,
 but there are a couple of recommendations that are necessary to follow
 for them to be processed by a reporting web app (Currently hosted at
-`report-nf <https://github.com/ODiogoSilva/report-nf>`_). However, if
+`flowcraft-webapp <https://github.com/assemblerflow/flowcraft-webapp>`_). However, if
 data processing will be performed with custom scripts, feel free to specify
 your own format.
 
@@ -63,33 +70,53 @@ Information meant to be displayed in tables should be in the following
 format::
 
     json_dic = {
-        "tableRow": [
-            {"header": "Raw BP",
-             "value": chars,
-             "table": "assembly",
-             "columnBar": True},
+        "tableRow": [{
+            "sample": "A",
+            "data": [{
+                "header": "Raw BP",
+                "value": 123,
+                "table": "qc"
+            }, {
+                "header": "Coverage",
+                "value": 32,
+                "table": "qc"
+            }]
+        }, {
+            "sample": "B",
+            "data": [{
+                "header": "Coverage",
+                "value": 35,
+                "table": "qc"
+            }]
+        }]
     }
 
-This means that the ``chars`` variable that is created during the execution
-of the process should appear as a table entry with the specified ``header``
-and ``value``. The ``table`` key specifies in which table of the reports
-it will appear and the ``columnBar`` key informs the report generator to
-create a bar column in that particular cell.
+This provides table information for multiple samples in the same process. In
+this case, data for two samples is provided. For each sample, values for
+one or more headers can be provided. For instance, this report provides
+information about the **Raw BP** and **Coverage** for sample **A** and this
+information should go to the **qc** table. If any other information is relevant
+to build the table, feel free to add more elements to the JSON.
 
 Information for plots
 ^^^^^^^^^^^^^^^^^^^^^
 
 Information meant to be displayed in plots should be in the following format::
 
     json_dic = {
-        "plotData":  {
-            "size_dist": size_dist
-        }
+        "plotData": [{
+            "sample": "strainA",
+            "data": {
+                "sparkline": 23123,
+                "otherplot": [1,2,3]
+             }
+        }],
     }
 
-This is a simple key:value pair, where the key is the ID of the plot in the
-reports and the ``size_dist`` contains the plot data that was gathered
-for a particular process.
+As in the table JSON, *plotData* should be an array with an entry for each
+sample. The data for each sample should be another JSON where the keys are
+the *plot signatures*, so that we know to which plot the data belongs. The
+corresponding values are whatever data object you need.
 
 Other information
 ^^^^^^^^^^^^^^^^^
@@ -99,6 +126,8 @@ is not particular format for other information. They will simply store the
 data of interest to report and it will be the job of a downstream report app
 to process that data into an actual visual report.
 
+.. _versions:
+
 Versions
 --------