A viral identification tool using machine learning with nucleotide and protein features
Report Bug
·
Request Feature
Table of Contents
This project was created to identify viral contigs in metagenomics. This project combines the use of gene content features and k-mer features to select viral contiguous sequences.
To get a local copy up and running follow these simple example steps.
Phybrid requires Python 3 and the following libraries (if installling through pip, libraries are automatically install)
- pandas
- scikit-learn
- biopython
Phybrid can only be forked from this repository. Once forked, enter into the files and compile the kmer counting program.
## git download here
cd Phybrid/scripts
tar xvf kmer-counter-master.zip
cd kmer-counter-master
make
Phybrid works as a python script. Once install via pip, Phybrid the command can be accessed. To get the help screen type:
cd Phybrid
scripts/Phybrid.py -h
The paramters of Phybrid are:
- -i: Input Fasta [required]
- -o: Output Directory [optional]
cd Phybrid
scripts/Phybrid.py -i data/Test/Viral_contigs.fasta -o data/Test/Output
Current Version: 0.0.1
Improvements to be made:
- Reduce feature space to allow for smaller file processing
- Grid search hyper parameters for models
- Build into python package
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
Distributed under the MIT License. See LICENSE
for more information.
Cody Glickman - @glickman_Cody - [email protected]
Project Link: https://github.com/Strong-Lab/Phybrid
- James Costello
- Michael Strong
- Jo Hendrix