Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to adjust the core gene threshold? #60

Open
ramiroricardo opened this issue Oct 6, 2020 · 5 comments
Open

is it possible to adjust the core gene threshold? #60

ramiroricardo opened this issue Oct 6, 2020 · 5 comments

Comments

@ramiroricardo
Copy link

Hi Sion,

Is there a way to control the % of genomes that must have a gene for it to be considered core? From what I understood it is set at 95%, but thresholds like 99% are also common in the literature.

Thanks

@SionBayliss
Copy link
Owner

No problem! What part of the pipeline would you like to adjust the threshold? PIRATE only explicitly separates genes into core/accessory at certain steps, such as when it generates the core alignment and when it plots summary figures and tables. These steps could be tweaked to use a more meaningful threshold for your analysis and could most likely be run after your analysis so that it does not have to be repeated.

@ramiroricardo
Copy link
Author

Hi Sion,

Thanks for your reply. I think it would be great to have such a threshold when the core alignment is generated. Though I think ideally, the same threshold would then be applied to the summary plots/tables to keep everything consistent.

@SionBayliss
Copy link
Owner

I will label this as a enhancement for the next release.

In the mean time changing the outputs to support this is relatively simple. The gene alignments can be generated using the scripts in the PIRATE/scripts directory inside your PIRATE output directory:

alignment:

create_pangenome_alignment.pl --dosage 1.25 -t 99 -i PIRATE.gene_families.ordered.tsv -f ./feature_sequences/ -o core_alignment.fasta -g core_alignment.gff

Plots are a little more complicated. You will need to search and replace 95 with 99 inside the following script (open it in a text editor) and then run it using:

Rscript plot_summary.R ./

Hope that helps,
Sion

@ramiroricardo
Copy link
Author

Thanks a lot, will test this soon!

@haruosuz
Copy link

haruosuz commented Jan 9, 2021

I look forward to the next release.

I would set roary -cd 100 to generate core_gene_alignment.aln for core genome phylogeny.

https://sanger-pathogens.github.io/Roary/

-cd FLOAT percentage of isolates a gene must be in to be core [99]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants