Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Someone asked us: can the output of mergeSTR be used as input to mergeSTR? In today's episode, we set out to answer that question by writing a test!
For the test, I subsetted one of our test files,
CEU_test.vcf.gz
, into three smaller files, each with non-overlapping sample subsets: 10 samples, 19 samples, and 150 samples. The original file had 10+19+150 = 179 samples. Then, I wrote a test that merged the 10 sample and 19 sample files and then merged the output of that with the 150 sample file. Finally, I check that the final output is equivalent to the original file.Unfortunately, the test failed. I haven't dug into why yet. The test is reporting that the headers aren't identical, but I think there are deeper issues too? For example, the alleles in the GT column aren't the same, either. I checked with
Anyway, I'm opening this PR in case we want to fix this. Or we can just leave it open indefinitely as a way of reporting the issue to everyone -- at least until someone has a chance to fix it.