Skip to content

Commit

Permalink
Merge pull request #8 from maabuu/develop
Browse files Browse the repository at this point in the history
Command line interface improvements and two bug fixes
  • Loading branch information
maabuu authored Aug 24, 2023
2 parents e35ff64 + 85d6a03 commit d01dd1a
Show file tree
Hide file tree
Showing 20 changed files with 360 additions and 222 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ DockBuster().bust(ligand_pred_file, protein_crystal_file)

Documentation is available at [https://posebusters.readthedocs.io](https://posebusters.readthedocs.io).

For more information about the tests and for a case study using PoseBusters to compare docking methods, refer to our preprint:
For more information about the tests and for a case study using PoseBusters to compare docking methods, refer to our [preprint](https://arxiv.org/abs/2308.05777):

```
@online{buttenschoen2023posebusters,
title = {{{PoseBusters}}: {{AI-based}} Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences},
Expand All @@ -68,7 +69,7 @@ For more information about the tests and for a case study using PoseBusters to c

## Feedback & Contact

We welcome all feedback. For code issues, please open an issue. For other inquiries contact us by email.
We welcome all feedback. For code issues, please open an issue. For other inquiries contact us by email.

## Thanks

Expand Down
21 changes: 21 additions & 0 deletions docs/source/checks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
Checks
====================================

Example failure modes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. |tetrahedral_stereo_fail| image:: images/tankbind_astex_1hww.png
:height: 500 px
:width: 650 px
Expand Down Expand Up @@ -173,3 +176,21 @@ with in a receptor's binding pocket without any steric clash.
| | |
| Ligand and receptor clash | |
+---------------------------------------------+----------------------------------------+

More details on tests and docking case study
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _preprint: https://arxiv.org/abs/2308.05777

For more detailed information about the tests and for a case study using PoseBusters to compare docking methods, refer to our `preprint`_:

.. code-block:: bibtex
@online{buttenschoen2023posebusters,
title = {{{PoseBusters}}: {{AI-based}} Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences},
shorttitle = {{{PoseBusters}}},
author = {Buttenschoen, Martin and Morris, Garrett M. and Deane, Charlotte M.},
date = {2023-08-10},
eprint = {2308.05777},
eprinttype = {arxiv}
}
26 changes: 13 additions & 13 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,25 +74,25 @@ The ``--out`` option can be used to save the output to a file::
Help
====================================

Run ``--help`` options prints information about the command line options.
Running with the ``--help`` option prints information about the command line options.

.. command-output:: bust --help

.. command-output:: bust --version


Configuration settings
====================================
.. Configuration settings
.. ====================================
PoseBusters will look for configuration parameters in a yaml file ``posebusters.yml``
in standard locations:
.. PoseBusters will look for configuration parameters in a yaml file ``posebusters.yml``
.. in standard locations:
1. ``/etc/posebusters.cfg`` or ``c:\posebusters\posebusters.cfg`` (system-wide),
2. ``~/.config/posebusters.cfg`` (``$XDG_CONFIG_HOME``) and ``~/.posebusters.cfg`` (``$HOME``)
for global (user-wide) settings, and
3. ``posebusters.cfg`` inside the working directory.
4. File location provided by the ``--config`` command line option.
.. 1. ``/etc/posebusters.cfg`` or ``c:\posebusters\posebusters.cfg`` (system-wide),
.. 2. ``~/.config/posebusters.cfg`` (``$XDG_CONFIG_HOME``) and ``~/.posebusters.cfg`` (``$HOME``)
.. for global (user-wide) settings, and
.. 3. ``posebusters.cfg`` inside the working directory.
.. 4. File location provided by the ``--config`` command line option.
Settings from these files are merged in the listed order of preference:
user-defined values have higher priority than system-wide defaults
and project-wide settings will override all others, when defined.
.. Settings from these files are merged in the listed order of preference:
.. user-defined values have higher priority than system-wide defaults
.. and project-wide settings will override all others, when defined.
7 changes: 3 additions & 4 deletions docs/source/example_dock_bust.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@
"metadata": {},
"outputs": [],
"source": [
"# install with\n",
"# conda create -n posebusters python=3.9\n",
"### install with\n",
"# conda create -n posebusters python=3.10\n",
"# conda activate posebusters\n",
"# pip install pandas click tqdm pyyaml rdkit jupyter notebook\n",
"# pip install -i https://www.stats.ox.ac.uk/\\~buttensc/dist posebusters --upgrade"
"# pip install posebusters --upgrade"
]
},
{
Expand Down
7 changes: 3 additions & 4 deletions docs/source/example_mol_bust.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@
"metadata": {},
"outputs": [],
"source": [
"# install with\n",
"# conda create -n posebusters python=3.9\n",
"### install with\n",
"# conda create -n posebusters python=3.10\n",
"# conda activate posebusters\n",
"# pip install pandas click tqdm pyyaml rdkit jupyter notebook\n",
"# pip install -i https://www.stats.ox.ac.uk/\\~buttensc/dist posebusters --upgrade"
"# pip install posebusters --upgrade"
]
},
{
Expand Down
20 changes: 20 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,24 @@ For more usage examples, bulk processing, and the Python API see the documentati
and the `Python library <python_library.ipynb>`_.


Docking case study
====================================

.. _preprint: https://arxiv.org/abs/2308.05777

For more detailed information about the tests and for a case study using PoseBusters to compare docking methods, refer to our `preprint`_:

.. code-block:: bibtex
@online{buttenschoen2023posebusters,
title = {{{PoseBusters}}: {{AI-based}} Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences},
shorttitle = {{{PoseBusters}}},
author = {Buttenschoen, Martin and Morris, Garrett M. and Deane, Charlotte M.},
date = {2023-08-10},
eprint = {2308.05777},
eprinttype = {arxiv}
}
Sample checks
====================================

Expand Down Expand Up @@ -158,6 +176,8 @@ For more information on the checks, see :ref:`checks`.
+---------------------------------------------+----------------------------------------+
| Volume overlap | |
+---------------------------------------------+----------------------------------------+
| Bad: | Good: |
| | |
| |volume_overlap_fail| | |volume_overlap_true| |
| | |
| Ligand and receptor clash | |
Expand Down
2 changes: 1 addition & 1 deletion posebusters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@
"check_volume_overlap",
]

__version__ = "0.2.2"
__version__ = "0.2.3"
151 changes: 104 additions & 47 deletions posebusters/cli.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
"""Command line interface for PoseBusters."""
from __future__ import annotations

import argparse
import logging
import sys
from pathlib import Path
from typing import Any, Iterable

import click
import pandas as pd
from rdkit.Chem.rdchem import Mol
from yaml import safe_load

from . import __version__
from .posebusters import PoseBusters
from .tools.formatting import create_long_output, create_short_output

Expand All @@ -15,63 +20,95 @@

def main():
"""Safe entry point for PoseBusters from the command line."""
parser = _parse_args(sys.argv[1:])
try:
bust()
bust(**vars(parser))
except Exception as e:
click.echo(e)


_path = click.Path(exists=True, path_type=Path)


@click.command(name="bust")
@click.argument("mol_pred", type=_path, required=True, default=None, nargs=-1)
@click.option("-l", "--mol_true", type=_path, required=False, default=None, help="True molecule, e.g. crystal ligand.")
@click.option("-p", "--mol_cond", type=_path, required=False, default=None, help="Conditioning molecule, e.g. protein.")
@click.option("-t", "--table", type=_path, help="Run multiple inputs listed in a .csv file.")
@click.option("-f", "--outfmt", type=click.Choice(["short", "long", "csv"]), default="short", help="Output format.")
@click.option(
"-o", "--out", "output", type=click.File("w"), default="-", help="Output file. Prints to stdout by default."
)
@click.option("-c", "--config", type=click.File("r"), default=None, help="Configuration file.")
@click.option("--full-report", type=bool, default=False, is_flag=True, help="Print full report.")
@click.option("--no-header", type=bool, default=False, is_flag=True, help="Print without header.")
# @click.option("--print-header", type=bool, default=False, is_flag=True, help="Print header only.")
@click.option("--top-n", type=int, default=None, help="Run on top N results in MOL_PRED only.")
@click.option("--debug", type=bool, default=False, is_flag=True, help="Enable debug output.")
@click.version_option()
def bust(table, outfmt, output, config, debug, no_header, full_report, top_n, **mol_args):
logger.error(e)


def bust(
mol_pred: list[Path | Mol] = [],
mol_true: Path | Mol | None = None,
mol_cond: Path | Mol | None = None,
table: Path | None = None,
outfmt: str = "short",
output=sys.stdout,
config: Path | None = None,
no_header: bool = False,
full_report: bool = False,
top_n: int | None = None,
):
"""PoseBusters: Plausibility checks for generated molecule poses."""
if debug:
click.echo("Debug mode is on.")

if outfmt == "short":
if full_report:
logger.warning("Full report is not available in short output format. Ignoring --full-report option.")
full_report = False

if table is None and mol_args.get("mol_pred") is None:
# check that an input was provided
click.echo("Provide either MOL_PRED or a table using the --table option.\n")
click.echo(bust.get_help(click.Context(bust)))
return None
if table is None and len(mol_pred) == 0:
raise ValueError("Provide either MOLS_PRED or TABLE.")
elif table is not None:
# run on table
file_paths = pd.read_csv(table, index_col=None)
mode = _select_mode(file_paths.columns.tolist()) if config is None else config
posebusters = PoseBusters(mode, top_n=top_n, debug=debug)
mode = _select_mode(config, file_paths.columns.tolist())
posebusters = PoseBusters(mode, top_n=top_n)
posebusters_results = posebusters.bust_table(file_paths)
else:
# run on file inputs
mode = _select_mode([m for m, v in mol_args.items() if v is not None]) if config is None else config
posebusters = PoseBusters(mode, top_n=top_n, debug=debug)
posebusters_results = posebusters.bust(**mol_args)
# run on single input
d = {k for k, v in dict(mol_pred=mol_pred, mol_true=mol_true, mol_cond=mol_cond).items() if v}
mode = _select_mode(config, d)
posebusters = PoseBusters(mode, top_n=top_n)
posebusters_results = posebusters.bust(mol_pred, mol_true, mol_cond)

for i, results_dict in enumerate(posebusters_results):
results = _dataframe_from_output(results_dict, posebusters.config, full_report)
output.write(_format_results(results, outfmt, no_header, i))


def _parse_args(args):
desc = "PoseBusters: Plausibility checks for generated molecule poses."
parser = argparse.ArgumentParser(description=desc, add_help=False)

# Create two argument groups
in_group = parser.add_argument_group(title="Input")
out_group = parser.add_argument_group(title="Output")
cfg_group = parser.add_argument_group(title="Configuration")
inf_group = parser.add_argument_group(title="Information")

# input
help = "molecule(s) to check"
in_group.add_argument("mol_pred", default=[], type=_path, nargs="*", help=help)
in_group.add_argument("-l", dest="mol_true", type=_path, help="true molecule, e.g. crystal ligand")
in_group.add_argument("-p", dest="mol_cond", type=_path, help="conditioning molecule, e.g. protein")
help = "run multiple inputs listed in a .csv file"
in_group.add_argument("-t", dest="table", type=_path, help=help)

# output options
out_group.add_argument("--outfmt", choices=["short", "long", "csv"], default="short", help="output format")
out_group.add_argument("--output", type=Path, default=sys.stdout, help="output file (default: stdout)")
# out_group.add_argument("--snake_case", action="store_false", help="use snake case for output columns")
out_group.add_argument("--full-report", action="store_true", help="print details for each test")
out_group.add_argument("--no-header", action="store_true", help="print output without header")

# config
cfg_group.add_argument("--config", type=_path, default=None, help="configuration file")
cfg_group.add_argument(
"--top-n", type=int, default=None, help="run on TOP_N results in MOL_PRED only (default: all)"
)

# other
inf_group.add_argument("-v", "--version", action="version", version=f"%(prog)s {__version__}")
inf_group.add_argument("-h", "--help", action="help", help="show this help message and exit")

namespace = parser.parse_args(args)

# check that either mol_pred or table was provided
if namespace.table is None and len(namespace.mol_pred) == 0:
parser.print_help()
parser.exit(status=1, message="\nProvide either MOL_PRED or TABLE as input.\n")

# full report only works with long and csv output
if namespace.full_report and namespace.outfmt == "short":
logger.warning("Option --full-report ignored. Please use --outfmt long or csv for --full-report.")
namespace.full_report = False
return namespace


def _dataframe_from_output(results_dict, config, full_report: bool = False) -> pd.DataFrame:
d = {id: {(module, output): value for module, output, value in results} for id, results in results_dict.items()}
df = pd.DataFrame.from_dict(d, orient="index")
Expand All @@ -97,21 +134,41 @@ def _format_results(df: pd.DataFrame, outfmt: str = "short", no_header: bool = F
return create_long_output(df)
elif outfmt == "csv":
header = (not no_header) and (index == 0)
df.index.names = ["file", "molecule"]
df.columns = [c.lower().replace(" ", "_") for c in df.columns]
return df.to_csv(index=True, header=header)
elif outfmt == "short":
return create_short_output(df)
else:
raise ValueError(f"Unknown output format {outfmt}")


def _select_mode(columns: list[str]) -> str:
# decide on mode to run for provided input table
def _select_mode(config, columns: Iterable[str]) -> str | dict[str, Any]:
# decide on mode to run

# load config if provided
if type(config) == Path:
return dict(safe_load(open(config)))

# forward string if config provide
if type(config) == str:
return str(config)

# select mode based on inputs
if "mol_pred" in columns and "mol_true" in columns and "mol_cond" in columns:
mode = "redock"
elif "mol_pred" in columns and ("protein" in columns) or ("mol_cond" in columns):
mode = "dock"
elif any(column in columns for column in ("mol_pred", "molecule", "molecules", "molecule")):
elif any(column in columns for column in ("mol_pred", "mols_pred", "molecule", "molecules", "molecule")):
mode = "mol"
else:
raise NotImplementedError(f"No supported columns found in csv. Columns found are {columns}")

return mode


def _path(path_str: str):
path = Path(path_str)
if not path.exists():
raise argparse.ArgumentTypeError(f"File {path} not found!")
return path
Loading

0 comments on commit d01dd1a

Please sign in to comment.