-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: beagle AP field #224
base: master
Are you sure you want to change the base?
fix: beagle AP field #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be a good idea to add a check at the end of the another test that will ensure the AP1, AP2, and DS flags were properly added to the merged VCF.
To do that, you can probably just reread the file with TRHarmonizer and then assert that the contents of the fields file are the same as what you would expect like this
TRTools/trtools/mergeSTR/tests/test_mergeSTR.py
Lines 360 to 366 in 5eb028e
def test_hipstr_output(args, mrgvcfdir): | |
fname1 = os.path.join(mrgvcfdir, "test_file_hipstr1.vcf.gz") | |
fname2 = os.path.join(mrgvcfdir, "test_file_hipstr2.vcf.gz") | |
args.vcftype = "hipstr" | |
args.vcfs = fname1 + "," + fname2 | |
assert main(args) == 0 | |
assert_same_vcf(args.out + '.vcf', mrgvcfdir + "/hipstr_merged.vcf") |
trtools/testsupport/sample_vcfs/beagle/hipstr_imputed_merge1.vcf
Outdated
Show resolved
Hide resolved
trtools/testsupport/sample_vcfs/beagle/hipstr_imputed_merge2.vcf
Outdated
Show resolved
Hide resolved
|
Thanks for adding this PR! Upon reviewing, I realized things might be slightly more complicated than we originally thought and we need to take the following into account:
All of these are addressable and we can discuss further how to integrate the required changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments left in the PR Conversation
thanks for explaining all of that, @gymreklab ! so, for those cases where the ref/alt fields aren't identical between the two or more VCFs, maybe we can do something like this?
are there any other situations I'm not thinking of? I suppose there might also be cases where the alleles are the same but are represented slightly differently (ex: not left-aligned)? |
It could make sense to do a first pass where we just refuse to merge those fields if the ref/alt aren't identical and in the same order and output a warning for those loci. My impression was that in most cases ref/alt should be exactly the same for different VCFs imputed from the same reference panel. If that is the case we might be able to get away with avoiding the more complex cases. However if those complex cases happen for more than a handful of loci we might have to. For the case where the alleles are identical but reordered: this should be handle-able. But I don't think we currently merge any other fields like this where we would need to reorder so would need to add some logic for that, making sure order is correct. For the case where the alleles don't match: in theory we could assign 0. But that might result in weird batch effects. I doubt those probabilities are truly 0 - if those alleles had been considered by Beagle they would probably have non-zero values. Another option would be to put a missing value there which I would prefer. In theory none of those cases should affect our main application of computing dosages. But again since this case is complex and in theory shouldn't happen I am hesitant to handle it for now. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is almost there. In addition to comments on individual lines we should:
- add tests for CheckIdenticalAlleleOrder
- Add back the deleted test file hipstr_merged.vcf causing tests to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add this file back it looks like maybe it was deleted by accident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to update the test for CheckIdenticalAlleles to directly call that function
assert ("Conflicting alt alleles found at" in capsys.readouterr().err) | ||
|
||
#check if identical allele order | ||
def test_CheckIdenticalAlleleOrder(args,mrgvcfdir,capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test should call the function CheckIdenticalAlleleOrder directly
This PR ensures that mergeSTR will output format fields from hipstr-beagle. Specifically, it adds support for the AP1, AP2, and DS fields.
Note that only hipstr input is supported for now. We did this because the imputation panels all seem to be produced by hipstr, anyway.
Checklist
fix:
. Otherwise, if it introduces a new feature, please prefix it withfeat:
. If it introduces a breaking change, please add an exclamation before the colon, likefeat!:
. If the scope of the PR changes because of a revision to it, please update the PR title, since the title will be used in our CHANGELOG.poetry lock --no-update
to ensure the lock file stays up to date and that our dependencies are locked to their minimum versions