Python implementation of GaussDCA using Cython. Adapted from here.
For the original paper please refer to "Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners" by Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt and Andrea Pagnani, (2014) PLoS ONE 9(3): e92721.
This version implements what is called the "slow fallback" in the original Julia implementation.
Runs in Python 3.6
- Make sure cython and numpy are installed and up to date:
pip install Cython
andpip install numpy
. - Compile the cython source code:
cd src; python setup.py build_ext -i; cd ..
python src/gaussdca.py [-h] [-o OUTPUT] [-s SEPARATION] [-t THREADS] alignment_file
So far, the alignment file needs to be in a3m format (with or without insertions). The output will be printed or saved into a file if given. Sequence separation and the number of threads for multiprocessing can be specified.
The following chart shows the elapsed runtime in minutes for a large test alignment (test/large.a3m) using 8 cores.
The first three bars show the effect of using different methods to do the matrix inversion:
- pinv: pseudoinverse from numpy.linalg (uses SVD)
- inv: multiplicative inverse from numpy.linalg
- inv(chol): computes the Cholesky decomposition first and then inverts the matrix
The next bar "inv(chol) opt" uses the same inversion as above, but with some additional techincal optimizations.
The last bar "julia" shows the runtime of the julia implementation on 8 cores, with alignment compression.
Alignment compression has not been implemented yet.