Pysarg

Python implementation of ARGs_OAP.

Warning

This repo is only for testing, please don't use!

Installation

Pre-compiled conda packages (osx-64/linux-64, python>=3.8).

conda install -c bioconda -c conda-forge xinehc::pysarg

If you encounter dependency conflicts or your current python version is lower than 3.8 (python<3.8), you may want to create a new conda environment with arbitrary name (here use -n pysarg as an example). Then switch to it by conda activate.

conda create -n pysarg -c bioconda -c conda-forge xinehc::pysarg
conda activate pysarg

Pysarg depends on python>=3.8, diamond>=2.0.15, bwa>=0.7.17, blast>=2.12, samtools>=1.15, pandas>=1.4. If your system has all the dependencies, then you can build it from source.

git clone https://github.com/xinehc/pysarg
cd pysarg
python setup.py install # use python3 if you needed

Example

Download the example files

Two examples (100k paired-end reads, 100 bp each) can be found here. The zipped file can be downloaded using wget:

wget https://dl.dropboxusercontent.com/s/054ufvfahchfk7f/example.tar.gz
tar -xvf example.tar.gz
cd example

Step 1: Make database

Pysarg supports both protein (prot) and nucleotide (nucl) database. By default it will use the SARG v3.0 database full version and the corresponding structure files as input to build a database named sarg. If customized databases or structures (e.g. SARG without multidrug resistant type) is of interest, you can change the default parameters of --input, --struc and --db (see pysarg makedb --help for more details). The type of the customized database will be detected automatically (prot or nucl).

Please note that the structure file --struc need to be tab separated and have no header. The first column of --struc need to match the sequence ID of --input and cannot have any white space (blast consider everything before white space as Sequence ID). For SARG, the three columns of the hierarchical structure file are gene, subtype and type, e.g. AAB20441 AAC(3)-Ia aminoglycoside.

pysarg makedb
# pysarg makedb --input *.fa --struc *.txt --db yourfavdb

After pysarg makedb, the information of available database will be printed on screen, for example:

db	size	type	directory
sarg	28517690	port	.../pysarg/DB/sarg
gg85	15295160	nucl	.../pysarg/DB/gg85
ko30	7159023	port	.../pysarg/DB/ko30

Database gg85 and ko30 are default databases for quantifying the 16s rRNA or cell numbers in samples, they will be used in stageone (see below). Database sarg (or customized yourfavdb) will be used in stagetwo (see below). If you do not want to use the database anymore, please delete (rm -rf) them use the absolute file path given in the table.

Run stageone

The 16s rRNA and cell numbers of each file in inputfqs will be estimated in stageone. By default, no compressed .gz files are supported as they may slow down the overall I/O, please unzip them by e.g. gzip -d first.

If reads are paired, the forward/reverse files need to have a format of *{_1, _2}{.fa, .fq, .fasta, .fastq}. Extension .fa/.fasta or .fq/.fastq will be detected automatically.

pysarg stageone -i inputfqs -o outputdir --clean

After stageone, a metadata.txt file can be found in outputdir (if flag --clean is given, then all temporary files will be removed and the metadata file should be the only one available in outputdir). It summarizes the 16s and cell numbers of each samples, for example:

filename	n_reads	n_16s	n_cells	filepath
STAS_1	100000	4.174475545952642	1.6004431914056478	.../example/inputdir/STAS_1.fa
STAS_2	100000	4.054822333841412	1.545260664720151	.../example/inputdir/STAS_2.fa
SWHAS104_1	100000	3.4944371029452244	1.7516589512278578	.../example/inputdir/SWHAS104_1.fa
SWHAS104_2	100000	3.515110704179947	1.7600058056139576	.../example/inputdir/SWHAS104_2.fa

Please check whether column filepath contains all files in your inputdir. If not, please make sure the extensions of the missing files are in {.fa, .fq, .fasta, .fastq}.

Run stagetwo

The number of ARGs (or other type of genes if you use a different database) will be estimated in stagetwo. The input directory (parameter -i) of stagetwo need to contain stageone's metadata.txt (in the above example outputdir). If no -o (output) given, stagetwo will save everything to -i. By default stagetwo use the sarg database, if you want to use a customized database, please change parameter -d.

pysarg stagetwo -i outputdir --clean
# pysarg stagetwo -i outputdir -o otheroutputdir -d yourfavdb --clean

After stagetwo, the normalized ARGs copies (or other type of genes) per 16s/cells or hits/reads will be shown in several *_normalized_*.txt files. For example, sarg_normalized_16s_struc2.txt means:

sarg - the database name (default sarg)
normalized_16s - hits are normalized against 16s rRNA
struc2 - the most coarse structure of the hierarchical structure file. In the sarg case struc0 means genes, struc1 means subtypes and struc2 means types.

struc2	STAS	SWHAS104
MLS	0.0	0.02062257391617481
aminoglycoside	0.016202273302162947	0.0702348721175952
bacitracin	0.014243756749154238	0.029022199416685344
beta-lactam	0.0	0.07435959439262274
multidrug	0.014763429463243433	0.042417186801122636
mupirocin	0.0029667248478081566	0.004557467877130284
quinolone	0.14468645528642876	0.04399873193528583
sulfonamide	0.013452071929840085	0.06808763199694166
tetracycline	0.004659396079993178	0.04969500656817937

Please note that the forward/reverse files are merged in stagetwo, so only two columns (two samples) are available in the example.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
src/pysarg		src/pysarg
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pysarg

Warning

Installation

Example

Download the example files

Step 1: Make database

Run stageone

Run stagetwo

Change log

About

Releases

Packages

Languages

MAXINELSX/pysarg

Folders and files

Latest commit

History

Repository files navigation

Pysarg

Warning

Installation

Example

Download the example files

Step 1: Make database

Run stageone

Run stagetwo

Change log

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages