Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chewbbaca visualization output #181

Open
davidmaimoun opened this issue Jul 25, 2023 · 4 comments
Open

Chewbbaca visualization output #181

davidmaimoun opened this issue Jul 25, 2023 · 4 comments
Assignees
Labels
Status: In Progress Has been assigned and is being worked on. Type: Question

Comments

@davidmaimoun
Copy link

Hello!
I'm new in the field and I need to use Chewbbaca In the end of the analysis, I get in a visualization folder, a file, cgMLST.tsv. Is the values in this file represent the allele distance of each specie from the schema alleles?
When I run it with grapetree, I get branches with values
Can you explain to me what are these values?

Thank you

@rfm-targa rfm-targa self-assigned this Jul 27, 2023
@rfm-targa rfm-targa added Type: Question Status: In Progress Has been assigned and is being worked on. labels Jul 27, 2023
@rfm-targa
Copy link
Contributor

Hello @davidmaimoun,

Sorry for the delay, and thank you for your interest in chewBBACA. Based on the name of the file you've described, cgMLST.tsv, I assume that you performed allele calling with the AlleleCall module and that you determined the core-genome based on the allele calling results with the ExtractCgMLST module. The cgMLST.tsv file contains the allelic profiles of your samples (each row is a strain, and each column is a locus/gene that is present in at least --t strains, where --t is the loci presence value you passed to the ExtractCgMLST module, or the default of [0.95, 0.99, 1] if you did not pass any value). The allelic profiles tell you which alleles were found in your strains. You can find more information about the output files created by the AlleleCall and the ExtractCgMLST modules here and here. The cgMLST.tsv file has the same file structure as the results_alleles.tsv file created by the AlleleCall module, with the difference that it only includes the results for the loci in the core-genome.
You can upload the files with the allelic profiles to GrapeTree or to PHYLOViZ to visualise a Minimum Spanning Tree (MST) and perform various dataset operations that allow you to explore and analyse the results (more information about uploading chewBBACA results to PHYLOViZ here). The values displayed in the MST branches correspond to the distance between the strains (the number of allelic differences based on all compared loci). The allelic distances are computed based on the allelic profiles (it computes a distance matrix with the number of allelic differences for each pair of strains).
I hope that I could help with my explanation. Feel free to let me know if there is anything else you would like to know.

Kind regards,

Rafael

@alexandreflageul
Copy link

Hi @rfm-targa
I write here, as the title of this issue can include my question.
I would like to include Chewbbaca in my analysis pipeline in complement of another tool that is cgMLSTFinder (from CGE).
With cgMLSTFinder, I used to get the complex type of the bacterial strain, and unfortunatelly, I can not find in Chewbbaca doc the way to retrieve the complexe type from chewbbaca analysis. I ran chewBBACA.py PrepExternalSchema to adapt Enterobase scheme, then I ran chewBBACA.py AlleleCall. Output from the last module do not display complexe type.

What am I missing ?
Regards,
Alexandre

@alexandreflageul
Copy link

@jacarrico @aplf

@rfm-targa
Copy link
Contributor

Hello @alexandreflageul,

Thank you for your interest in chewBBACA. chewBBACA does not assign CTs to bacterial strains. The main output of the AlleleCall module is the file containing the allelic profiles, results_alleles.tsv. The allelic profiles contained in this file can serve as the basis for subsequent analysis. You can import that file and sample metadata to PHYLOViZ to visualize an MST and explore the results through several dataset operations. If you want to cluster your samples to identify meaningful clustering levels and define CTs, I'd recommend ReporTree or HierCC. It might also be worth looking up the more recent concept of LIN codes.
Let us know if there's anything else.

Kind regards,

Rafael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on. Type: Question
Projects
None yet
Development

No branches or pull requests

3 participants