DLOmix is a Python framework for Deep Learning in Proteomics. Initially built on top of TensorFlow/Keras, support for PyTorch can however be integrated once the main API is established.
Experiment a simple retention time prediction use-case using Google Colab
A version that includes experiment tracking with Weights and Biases is available here
Resources Repository
More learning resources can be found in the dlomix-resources repository.
Run the following to install:
$ pip install dlomix
If you would like to use Weights & Biases for experiment tracking and use the available reports for Retention Time under /notebooks
, please install the optional wandb
python dependency with dlomix
by running:
$ pip install dlomix[wandb]
General Overview
data
: structures for modeling the input data, processing functions, and feature extractions based on Hugging Face datasetsDataset
andDatasetDict
eval
: classes for evaluating models and reporting resultslayers
: custom layers used for building models, based ontf.keras.layers.Layer
losses
: custom losses to be used for training withmodel.fit()
models
: common model architectures for the relevant use-cases based ontf.keras.Model
to allow for using the Keras training APIpipelines
: an exemplary high-level pipeline implementationreports
: classes for generating reports related to the different tasksconstants.py
: constants and configuration values
Use-cases
-
Retention Time Prediction:
- a regression problem where the retention time of a peptide sequence is to be predicted.
-
Fragment Ion Intensity Prediction:
- a multi-output regression problem where the intensity values for fragment ions are predicted given a peptide sequence along with some additional features.
-
Peptide Detectability (Pfly) [4]:
- a multi-class classification problem where the detectability of a peptide is predicted given the peptide sequence.
To-Do
Functionality:
- integrate prosit
- integrate hugging face datasets
- extend data representation to include modifications
- add PTM features
- add residual plots to reporting, possibly other regression analysis tools
- output reporting results as PDF
- refactor reporting module to use W&B Report API (Retention Time)
- add additional detectability task
- extend pipeline for different types of models and backbones
- extend pipeline to allow for fine-tuning with custom datasets
Package structure:
- integrate
deeplc.py
intomodels.py
, preferably introduce a package structure (e.g.models.retention_time
) - add references for implemented models in the ReadMe
- introduce formatting and precommit hooks
- plan documentation (sphinx and readthedocs)
- refactor following best practices for cleaner install
To install dlomix, along with the tools needed to develop and run tests, run the following command in your virtualenv:
$ pip install -e .[dev]
References:
[Prosit]
[1] Gessulat, S., Schmidt, T., Zolg, D. P., Samaras, P., Schnatbaum, K., Zerweck, J., ... & Wilhelm, M. (2019). Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nature methods, 16(6), 509-518.
[DeepLC]
[2] DeepLC can predict retention times for peptides that carry as-yet unseen modifications Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens, Sven Degroeve bioRxiv 2020.03.28.013003; doi: 10.1101/2020.03.28.013003
[3] Bouwmeester, R., Gabriels, R., Hulstaert, N. et al. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat Methods 18, 1363–1369 (2021). https://doi.org/10.1038/s41592-021-01301-5
[Detectability - Pfly]
[4] Abdul-Khalek, N., Picciani, M., Wimmer, R., Overgaard, M. T., Wilhelm, M., & Gregersen Echers, S. (2024). To fly, or not to fly, that is the question: A deep learning model for peptide detectability prediction in mass spectrometry. bioRxiv, 2024-10.