confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

QianghuiZhu · 2024-09-10T06:30:27Z

Hi, vg is a great software in pangenome.

I have 12 genome and gff files, and already built a graph-based pangenome with SV vcf file by vg construct and vg index.

I also have some RNA-seq data, and want to align RNA-seq data to graph pangenome.
In my opinion, it seems that I should re-build a graph pangenome using vg autoindex -w mpmap -v sv.vcf.gz rather than using above index. But for options -r and --tx-gff which may repeat in vg autoindex, should I use just one genome as ref or all of 12 genomes?

I hope for your response.
Thanks!

jeizenga · 2024-09-10T17:54:07Z

vg autoindex is designed to take common interchange formats like FASTA and VCF and produce internal vg formats like the ones you get from vg index. So, yes, you would not use your already-constructed indexes if you want to use vg autoindex.

Most users starting from a VCF+FASTA will only have GFFs for the reference sequence, so I'm not sure what your 12 GFFs look like. VCF doesn't always neatly preserve contig coordinates, so I think it would be very difficult to get sensible results using haplotype-specific GFFs. Certainly, the pipeline is better tested and hardened using one GFF. The reason we allow multiple GFF inputs is more to accommodate users who have GFFs that are split up by chromosome.

QianghuiZhu · 2024-09-11T02:25:47Z

Thanks for this.
We assemblied 12 genomes and annotated them, so we have multiple FASTA and GFF files.
I will only use one genome and its related gff file as input for vg autoindex.
Best!

jeizenga · 2024-09-11T19:31:15Z

If you build a graph using the raw assemblies (e.g. using Minigraph-Cactus), you could also supply a GFA file containing the haplotypes and then also provide the individual haplotype annotations to vg autoindex using --hap-tx-gff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

QianghuiZhu commented Sep 10, 2024

jeizenga commented Sep 10, 2024

QianghuiZhu commented Sep 11, 2024

jeizenga commented Sep 11, 2024 •

edited

Loading

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

Comments

QianghuiZhu commented Sep 10, 2024

jeizenga commented Sep 10, 2024

QianghuiZhu commented Sep 11, 2024

jeizenga commented Sep 11, 2024 • edited Loading

jeizenga commented Sep 11, 2024 •

edited

Loading