Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

Open
QianghuiZhu opened this issue Sep 10, 2024 · 3 comments

Comments

@QianghuiZhu
Copy link

Hi, vg is a great software in pangenome.

I have 12 genome and gff files, and already built a graph-based pangenome with SV vcf file by vg construct and vg index.

I also have some RNA-seq data, and want to align RNA-seq data to graph pangenome.
In my opinion, it seems that I should re-build a graph pangenome using vg autoindex -w mpmap -v sv.vcf.gz rather than using above index. But for options -r and --tx-gff which may repeat in vg autoindex, should I use just one genome as ref or all of 12 genomes?

I hope for your response.
Thanks!

@jeizenga
Copy link
Contributor

vg autoindex is designed to take common interchange formats like FASTA and VCF and produce internal vg formats like the ones you get from vg index. So, yes, you would not use your already-constructed indexes if you want to use vg autoindex.

Most users starting from a VCF+FASTA will only have GFFs for the reference sequence, so I'm not sure what your 12 GFFs look like. VCF doesn't always neatly preserve contig coordinates, so I think it would be very difficult to get sensible results using haplotype-specific GFFs. Certainly, the pipeline is better tested and hardened using one GFF. The reason we allow multiple GFF inputs is more to accommodate users who have GFFs that are split up by chromosome.

@QianghuiZhu
Copy link
Author

Thanks for this.
We assemblied 12 genomes and annotated them, so we have multiple FASTA and GFF files.
I will only use one genome and its related gff file as input for vg autoindex.
Best!

@jeizenga
Copy link
Contributor

jeizenga commented Sep 11, 2024

If you build a graph using the raw assemblies (e.g. using Minigraph-Cactus), you could also supply a GFA file containing the haplotypes and then also provide the individual haplotype annotations to vg autoindex using --hap-tx-gff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants