Skip to content

Tools and routines to calculate distances between synthesis routes and to cluster them.

License

Notifications You must be signed in to change notification settings

MolecularAI/route-distances

Repository files navigation

route-distances

License Tests codecov Code style: black

This repository contains tools and routines to calculate distances between synthesis routes and to cluster them.

This repository is mainly intended for developers and researchers. If you want a fully functional tool that is easy to use, please consider looking into the AiZynthFinder project.

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Linux, Windows or macOS platforms are supported - as long as the dependencies are supported on these platforms.

  • You have installed anaconda or miniconda with python 3.9 to 3.11

The tool has been developed on a Linux platform, but the software has been tested on Windows 10 and macOS Catalina.

Installation

For users

Setup your python environment and then run

pip install route-distances

For developers

First clone the repository using Git.

Then execute the following commands in the root of the repository

conda env create -f conda-env.yml
conda activate routes-env
poetry install

the route_distances package is now installed in editable mode.

Usage

The tool will install the cluster_aizynth_output that is used to calculate distances and clusters for AiZynthFinder output

cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model ted

This will perform TED calculations and add a column distance_matrix with the distances and column cluster_labels with the cluster labels for each route to the output file.

An ML model for fast predictions can be found here: https://zenodo.org/record/4925903.

This can be used with the cluster_aizynth_output tool

cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model chembl_10k_route_distance_model.ckpt

For further details, please consult the documentation.

Development

Testing

Tests uses the pytest package, and is installed by poetry

Run the tests using:

pytest -v

Documentation generation

The documentation is generated by Sphinx from hand-written tutorials and docstrings

The HTML documentation can be generated by

invoke build-docs

Contributing

We welcome contributions, in the form of issues or pull requests.

If you have a question or want to report a bug, please submit an issue.

To contribute with code to the project, follow these steps:

  1. Fork this repository.
  2. Create a branch: git checkout -b <branch_name>.
  3. Make your changes and commit them: git commit -m '<commit_message>'
  4. Push to the remote branch: git push
  5. Create the pull request.

Please use black package for formatting, and follow pep8 style guide.

Contributors

  • Samuel Genheden

The contributors have limited time for support questions, but please do not hesitate to submit an issue (see above).

License

The software is licensed under the MIT license (see LICENSE file), and is free and provided as-is.

References

  1. Genheden S, Engkvist O, Bjerrum E (2021) Clustering of synthetic routes using tree edit distance. J. Chem. Inf. Model. 61:3899–3907 https://doi.org/10.1021/acs.jcim.1c00232
  2. Genheden S, Engkvist O, Bjerrum E (2022) Fast prediction of distances between synthetic routes with deep learning. Mach. Learn. Sci. Technol. 3:015018 https://doi.org/10.1088/2632-2153/ac4a91