Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add note about bcftools version for merging to annotaTR #245

Merged
merged 2 commits into from
Dec 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions trtools/annotaTR/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,12 @@ where:
Additional relevant options:

* :code:`--match-refpanel-on <string>`: indicates how to match loci between the reference panel and the target VCF. Options: locid, rawalleles, trimmedalleles (Default:locid)

* **locid** matches on the ID in the VCF file. If your reference panel does not have informative IDs for TRs (e.g. all are set to "."), this option will not work and annotaTR will output an error
* **rawalleles** means loci are matched on :code:`chrom:pos:ref:alt`
* **trimmedalleles** means loci are matched on :code:`chrom:pos:ref:alt` but ref and alt alleles are trimmed to remove common prefixes/suffixes. The trimmedalleles option must be used if you merged samples in your target VCF file using :code:`bcftools merge`, since that tool will modify alleles to remove common sequence (see `this issue <https://github.com/samtools/bcftools/issues/726>`_)
* **rawalleles** means loci are matched on :code:`chrom:pos:ref:alt`. Note if you merged samples in your target VCF file using :code:`bcftools merge`, you should instead use the **trimmedalleles** option below, since bcftools will modify alleles to remove common sequence (see `this issue <https://github.com/samtools/bcftools/issues/726>`_)
* **trimmedalleles** means loci are matched on :code:`chrom:pos:ref:alt` but ref and alt alleles are trimmed to remove common prefixes/suffixes.
* :code:`--ignore-duplicates`: This flag outputs a warning if duplicate loci are detected in the reference. If this flag is not set and a duplicate locus is detected, the program quits.
* :code:`--update-ref-alt`: Update the REF/ALT allele sequences from the reference panel. Fixes issue with alleles being chopped after bcftools merge. Use with caution as this assumes allele order is exactly the same between the refpanel and target VCF. Only works when matching on locus id.
* :code:`--update-ref-alt`: Update the REF/ALT allele sequences from the reference panel. Fixes issue with alleles being chopped after bcftools merge. Use with caution as this assumes allele order is exactly the same between the refpanel and target VCF. Only works when matching on locus id. **Note**: We have tested merging with bcftools v1.20. Previous versions of bcftools might switch allele order (see https://github.com/gymrek-lab/TRTools/issues/244).

If generating a VCF output file, this command will output a new file containing only STRs, with the following fields added back depending on the genotyper used to generate the reference panel:

Expand Down Expand Up @@ -145,4 +146,4 @@ Below are :code:`annotaTR` examples using data files that can be found at https:

# Compute dosages based on Beagle AP field
# Require setting --match-refpanel-on since locus IDs are "." in this panel
annotaTR --vcf beagle_imputed_withap.vcf.gz --vcftype hipstr --ref-panel beagle_refpanel.vcf.gz --match-refpanel-on trimmedalleles --dosages beagleap --out test_beagleap
annotaTR --vcf beagle_imputed_withap.vcf.gz --vcftype hipstr --ref-panel beagle_refpanel.vcf.gz --match-refpanel-on trimmedalleles --dosages beagleap --out test_beagleap
Loading