-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analysis docs #681
analysis docs #681
Changes from 7 commits
884a97e
6557eb5
dccfa08
2848d07
89e1b9b
b7d68ef
8bdc984
ba0cb5a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
# Analysing runs | ||
|
||
HADDOCK3 allows to analyse different steps of the workflow, even after it has been completed. | ||
|
||
## haddock3-analyse | ||
|
||
The `haddock3-analyse` command is the main tool for the analysis of one or more workflow steps. | ||
|
||
``` | ||
haddock3-analyse -r my-run-folder -m 2 5 6 | ||
``` | ||
|
||
Here `my-run-folder` is the run directory and 2, 5, and 6 are the steps that you want to analyse. | ||
|
||
The command will inspect the folder, looking for the existing models. If the selected module is | ||
a `caprieval` module, `haddock3-analyse` simply loads the `capri_ss.tsv` and `capri_clt.tsv` files | ||
produced by the `caprieval` module. Otherwise, `haddock3-analyse` runs a `caprieval` analysis of the models. | ||
You can provide some [caprieval-specific parameters](https://github.com/haddocking/haddock3/blob/main/src/haddock/modules/analysis/caprieval/defaults.yaml) | ||
using the following syntax: | ||
|
||
``` | ||
haddock3-analyse -r my-run-folder -m 2 5 6 -p reference_fname my_ref.pdb receptor_chain F | ||
``` | ||
|
||
Here the `-p` key tells the code that you are about to insert caprieval parameters, whose name should match the parameter name. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. parameter name and corresponding value must be separated by space. |
||
|
||
Another parameter that can be specified is `top_cluster`, which defines how many of the first N clusters will be considered in the analysis. | ||
This value is set to 10 by default. | ||
|
||
``` | ||
haddock3-analyse -r my-run-folder -m 2 5 6 --top_cluster 12 | ||
``` | ||
|
||
This number is meaningless when dealing with models with no cluster information, that is, models that have never been clustered before. | ||
|
||
By default `haddock3-analyse` produces [plotly](https://plotly.com/python/) plots in the html `format`, but the user can select | ||
one of the formats available [here](https://plotly.github.io/plotly.py-docs/generated/plotly.io.write_image.html), | ||
while also adjusting the resolution with the `scale` parameter: | ||
|
||
``` | ||
haddock3-analyse -r my-run-folder -m 2 5 6 --format pdf --scale 2.0 | ||
``` | ||
|
||
### The analysis folder | ||
|
||
After running `haddock3-analyse` you can check the content of the `analysis` directory in your run folder. | ||
If everything went succesfully, one of the above commands could have produced an analysis folder structured as | ||
|
||
``` | ||
my-run-folder/ | ||
|--- analysis/ | ||
|--- 2_caprieval_analysis | ||
|--- 5_seletopclusts_analysis | ||
|--- 6_flexref_analysis | ||
``` | ||
|
||
Each subfolder contains all the analysis plots related to that specific step of the workflow. | ||
|
||
By default `haddock3-analyse` produces a set of scatter plots that compare each HADDOCK energy term | ||
(i.e., the HADDOCK score and its components) to the different metrics used to evaluate the quality of a model, | ||
such as the interface-RMSD, Fnat, DOCKQ, and so on. An example is available [here](../figs/irmsd_score.png). | ||
|
||
For each of the energy component and the metrics mentioned above `haddock3-analyse` produces also a box plot, in which each cluster | ||
is considered separately. An example is available [here](../figs/score_clt.png). | ||
|
||
### The report | ||
|
||
Scatter plots, box plots, CAPRI statistics and an interactive visualization of the models is available in the `report.html` file, present | ||
in each analysis subfolder. In order to visualize the models it is necessary to start a local server at the end of the `haddock3-analyse` run, | ||
following the indications provided in the log file: | ||
|
||
``` | ||
[2023-08-24 10:09:09,552 cli_analyse INFO] View the results in analysis/12_caprieval_analysis/report.html | ||
[2023-08-24 10:09:09,552 cli_analyse INFO] To view structures or download the structure files, in a terminal run the command | ||
`python -m http.server --directory /haddock3/examples/docking-antibody-antigen/run1-CDR-acc-cltsel-test`. | ||
By default, http server runs on `http://0.0.0.0:8000/`. Open the link | ||
http://0.0.0.0:8000/analysis/12_caprieval_analysis/report.html in a web browser. | ||
``` | ||
|
||
Launch this command to open the report: | ||
``` | ||
python -m http.server --directory path-to-my-run | ||
``` | ||
|
||
In the browser you can navigate to each analysis subfolder and open the `report.html` file. If you are not interested in | ||
visualizing the models, you can simply open the `report.html` file in a standard browser. | ||
|
||
## haddock3-traceback | ||
|
||
HADDOCK3 is highly customisable and modular, as the user can introduce several refinement, clustering, and scoring steps in a workflow. | ||
Quantifying the impact of the different modules is important while developing a novel docking protocol. The `haddock3-traceback` command | ||
is developed to assist the user in this task, as it allows to "connect" all the models generated in a HADDOCK3 workflow: | ||
|
||
``` | ||
haddock3-traceback my-run-folder | ||
``` | ||
|
||
`haddock3-traceback` creates a traceback subfolder within the `my-run-folder` directory, containing a `traceback.tsv` table: | ||
|
||
``` | ||
00_topo1 00_topo2 01_rigidbody 01_rigidbody_rank 04_seletopclusts 04_seletopclusts_rank 06_flexref 06_flexref_rank | ||
4G6K.psf 4I1B.psf rigidbody_10.pdb 3 cluster_1_model_1.pdb 1 flexref_1.pdb 2 | ||
4G6K.psf 4I1B.psf rigidbody_11.pdb 10 cluster_1_model_2.pdb 3 flexref_3.pdb 1 | ||
4G6K.psf 4I1B.psf rigidbody_18.pdb 4 cluster_2_model_1.pdb 2 flexref_2.pdb 4 | ||
4G6K.psf 4I1B.psf rigidbody_20.pdb 15 cluster_2_model_2.pdb 4 flexref_4.pdb 3 | ||
``` | ||
|
||
In this table each row represents a model that has been produced by the workflow. The (typically) two used topologies are reported first, | ||
and then each module has its own column, containing the name and rank of the model at that stage. As an example, in the first row of the | ||
table above `rigidbody_10.pdb` is ranked 3rd at the `rigidbody` stage. Then, it becomes `cluster_1_model_1.pdb` (ranked 1st) after | ||
the `seletopclusts` module. This model is then refined in `flexref_1.pdb`, which turns out to be the 2nd best model at the end of the workflow. | ||
|
||
The table can be easily parsed and used to evaluate the impact of different refinement steps on the different models. | ||
|
||
## The postprocess option | ||
|
||
You may want to run the `haddock3-analyse` and `haddock3-traceback` commands by default at the end of the workflow. | ||
The `postprocess` option of a standard HADDOCK3 configuration (.cfg) file is devoted to this task. At first, it forces HADDOCK3 | ||
to execute `haddock3-analyse` on all the `caprieval` folders found in the workflow, therefore loading data present in the CAPRI tables. | ||
Second, it executes the `haddock3-traceback` command. | ||
|
||
To activate this, just set `postprocess` to `true` at the beginning of your configuration file: | ||
|
||
``` | ||
==================================================================== | ||
# This is a HADDOCK3 configuration file | ||
|
||
# directory in which the docking will be done | ||
run_dir = "my-run-folder" | ||
|
||
# postprocess the run | ||
postprocess = true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Open question (that should be discussed): There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I think we discussed this with @amjjbonvin in the past and decided to set it to false by default, but I am more than open to re-discuss it! |
||
|
||
... | ||
``` | ||
|
||
You can find additional help by running the command: `haddock3-analyse -h` and `haddock3-traceback -h` and reading | ||
the parameters' explanations. Otherwise, ask us in the ["issues" forum](https://github.com/haddocking/haddock3/issues). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about placing the
post-processing
section in first ? and then explaining thehaddock3-analyse
andhaddock3-traceback
?Also adding internal links to the 3 sections that will be later presented ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree about the bindings but I would not change the order..it's complicated to describe the postprocess option without saying anything about the commands it executes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I would not modify the titles, I think that it's beneficial to have "haddock3-analyse" as a title for the user