Skip to content

samuelmurail/af_analysis

Repository files navigation

Documentation Status codecov Build Status PyPI - Version Downloads status License: GPL v2

About Alphafold Analysis

AF Analysis Logo

af-analysis is a python package for the analysis of AlphaFold protein structure predictions. This package is designed to simplify and streamline the process of working with protein structures generated by:

Source code repository: https://github.com/samuelmurail/af_analysis

Statement of Need

AlphaFold 2 and its derivatives have revolutionized protein structure prediction, achieving remarkable accuracy. Analyzing the abundance of resulting structural models can be challenging and time-consuming. Existing tools often require separate scripts for calculating various quality metrics (pDockQ, pDockQ2, LIS score) and assessing model diversity. af-analysis addresses these challenges by providing a unified and user-friendly framework for in-depth analysis of AlphaFold 2 results.

Main features:

  • Import AlphaFold or ColabFold prediction directories as pandas DataFrames for efficient data handling.
  • Calculate and add additional structural quality metrics to the DataFrame, including:
    • pDockQ
    • pDockQ2
    • LIS score
  • Visualize predicted protein models.
  • Cluster generated models to identify diverse conformations.
  • Select the best models based on defined criteria.
  • Add your custom metrics to the DataFrame for further analysis.

Installation

  • af-analysis is available on PyPI and can be installed using pip:
pip install af_analysis
  • You can install last version from the github repo:
pip install git+https://github.com/samuelmurail/af_analysis.git@main
  • AF-Analysis can also be installed easily through github:
git clone https://github.com/samuelmurail/af_analysis
cd af_analysis
pip install .

Documentation

The full documentation is available at ReadTheDocs.

Usage

Importing data

Create the Data object, giving the path of the directory containing the results of the alphafold2/colabfold run.

import af_analysis
my_data = af_analysis.Data('MY_AF_RESULTS_DIR')

Extracted data are available in the df attribute of the Data object.

my_data.df

Analysis

  • The analysis package contains several function to add metrics like pdockQ and pdockQ2:
from af_analysis import analysis
analysis.pdockq(my_data)
analysis.pdockq2(my_data)

Docking Analysis

  • The docking package contains several function to add metrics like LIS Score:
from af_analysis import docking
docking.LIS_pep(my_data)

Plots

  • At first approach the user can visualize the pLDDT, PAE matrix and the model scores. The show_info() function displays the scores of the models, as well as the pLDDT plot and PAE matrix in a interactive way.

Interactive Visualization

  • plot msa, plddt and PAE:
my_data.plot_msa()
my_data.plot_plddt([0,1])
best_model_index = my_data.df['ranking_confidence'].idxmax()
my_data.plot_pae(best_model_index)
  • show 3D structure (nglview package required):
my_data.show_3d(my_data.df['ranking_confidence'].idxmax())

Dependencies

af_analysis requires the following dependencies:

  • pdb_numpy
  • pandas
  • numpy
  • tqdm
  • seaborn
  • cmcrameri
  • nglview
  • ipywidgets
  • mdanalysis

Contributing

af-analysis is an open-source project and contributions are welcome. If you find a bug or have a feature request, please open an issue on the GitHub repository at https://github.com/samuelmurail/af_analysis. If you would like to contribute code, please fork the repository and submit a pull request.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the GNU General Public License version 2 - see the LICENSE file for details.

References

About

Analysis of alphafold and colabfold results

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published