-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README.md options #63
Comments
Thanks for the feedback on the README.
Subset alignments functionality changed during development and will not function on unique_alleles, I will update the README accordingly. All the best, |
|
I would be grateful if you could provide examples for Subset alignments. Running the command printed the following messages and generated empty output_directory.
|
This looks like a bug. I will try and fix it when I get a spare minute or two. Thanks, |
https://github.com/SionBayliss/PIRATE/blob/master/README.md#output-files
Running PIRATE with In the following case, the feature_sequences/ directory contained amino acid and nucleotide sequences for only 104 of the 139 gene families; i.e., amino acid and nucleotide sequences for 35 gene families (g0001, g0002, g0003, g0004, g0006, g0010, ...) are missing in the feature_sequences/ directory.
In another case, running the following command generated the feature_sequences/ directory which contained amino acid and nucleotide sequences for only 189 of the 202 gene families.
|
That is most likely due to PIRATE excluding genes with high copy numbers. If genes are highly fragmented then alignment will likely be problematic. This can be adjusted accordingly using the --dosage option for align_feature_sequences.pl and create_pangenome_alignment.pl. The default is 1.25. You can raise the threshold, but I would question the utility of including these genes in an alignment for certain applications (e.g. core genome trees). |
I was wondering if descriptions for output files such as genome2loci.tab and loci_list.tab could be added below? The number of loci (CDS) can be obtained from the output files? https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC6785682/
|
I have questions about options in https://github.com/SionBayliss/PIRATE/blob/master/README.md
Usage
Basic examples
-a|--align
-r|--rplots
Running PIRATE with the options
-a -r
generated the core_alignment.fasta and pangenome_alignment.fasta files, and PIRATE_plots.pdf file in which Tree in page 8 (Number of gene families per sample) and page 9 (Pangenome cluster presence/absence) is drawn using the binary_presence_absence.nwk file. Is it possible to generate PIRATE_plots.pdf in which Tree is drawn using a Newick file obtained by performingfasttree
on core_alignment.fasta?The core_alignment.fasta file contained many "N" (e.g. "TNNNNNNNNNA") although input nucleotide sequence data contained only alphabet of "ACGT".
--pan-opt
should be changed to the following?
-s|--steps
-n|--nucl
Is there any difference between the options
Global: -s|--steps
andClustering options: -s|--steps
?Advanced examples
-f|--flat
In this paper (https://academic.oup.com/gigascience/article/8/10/giz119/5584409), "A default MCL inflation value of 2" was used for intra-species clustering (Figure 3. complete Staphylococcus aureus genomes), while "an MCL inflation value of 6" was used for intra-species clustering (Figure 4. Pseudomonas complete genomes) and inter-species comparisons (Figure 5. Prochlorococcus marinus draft genomes). The MCL inflation value of 2 or 6 was chosen based on the previous studies for these bacterial taxa?
I presume a default MCL inflation value was changed from 2 to 1.5.
-e|--evalue
should be changed to the following?
https://github.com/SionBayliss/PIRATE#piratetsv-file-format
PIRATE.*.tsv file format
21-22/ synteny_cluster/synteny_cluster_order - The syntenic cluster the gene_family has been assigned to and the corresponding order within the cluster. NOTE: these columns are only present in PIRATE.gene_families.tsv.
should be changed to the following?
21-22/ cluster/cluster_order - The syntenic cluster the gene_family has been assigned to and the corresponding order within the cluster. NOTE: these columns are only present in PIRATE.gene_families.ordered.tsv.
Support Scripts
Subset Outputs
Subsample PIRATE.gene_families.ordered.tsv file and rename loci in output. Allows for recalculation of number of genomes gene_families are present in PIRATE.gene_families.ordered.tsv
should be changed to the following?
Subsample PIRATE.*.tsv files and rename loci in output. Allows for recalculation of number of genomes gene_families are present in PIRATE.*.tsv
Subset alignments
Running the command printed the following messages and generated empty output_directory.
identify representative sequences for gene families/alleles
Identify the representative sequence for each cluster in a PIRATE.*.tsv file. The file can be found at PIRATE/scripts/representative_sequences.pl.
PIRATE/scripts/representative_sequences.pl should be the following file?
Unique gene sequences
should be the following?
Convert to roary file
should be the following?
Convert to binary presence-absence or count
Convert PIRATE.*.tsv to "gene/allele presence-absence" and "paralog presence-absence" tsv files can be done with the same command?
should be the following?
The text was updated successfully, but these errors were encountered: