-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilterIntervals will get rid of Y chromosome intervals if there are >50% of female samples #9043
Comments
Hi @NotAPoetButACriminal
We would suggest you to set --low-count-filter-percentage-of-samples to something much greater (e.g. 90) than those of female percentage so that Y fragments will remain regardless of the female count. You may wish to avoid running this filter on Y chromosome by adjusting Alternatively you may run this tool in 3 rounds first on autosomes only, second on X chromosome and finally on Y chromosome but beware that X chromosome on males is a single copy therefore counts may be affected and may end up removing more of X than what you expect to have. Once you produced filtered intervals for autosomes, X and Y you may combine them to proceed to the next stage. We do not have a special consideration for X and Y for this step because it will require you to know the gender and chromosome counts of all samples before, tested by other orthogonal methods. Setting percentage of samples to a higher value may end up producing more CNV calls in common CNV polymorphic regions but most of which won't be a false positive if your samples are all balanced for FOLD80 Base Penalty and AT/GC dropout rates as well as insert sizes. |
This intended behavior is not very useful when the behavior changes every time based on the sex ratio of the cohort. Running -XL chrY will always remove chrY, which is the problem to begin with. Some kind of reverse -XL would be needed as an always include option, but even that would not be ideal as then chrY bins would never be filtered. I understand that FilterIntervals itself is not aware of the sex, however DetermineGermlineContigPloidy also estimates sex, so perhaps in future versions FilterIntervals could be expanded to contain a --contig-ploidy-calls flag similar to GermlineCNVCaller, so that it could be run downstream of DetermineGermlineContigPloidy, and then it could perform filtering of chrY contings only on samples that have ploidy 1 on chrY ie males. For now a quick fix is increasing the -low-count-filter-percentage-of-samples, but this may end up producing more CNV calls as you said. |
Bug Report
FilterIntervals
gatk FilterIntervals
-L ${OUTPUT}/bins.interval_list
--annotated-intervals ${OUTPUT}/bins_annotated.interval_list
-imr OVERLAPPING_ONLY
$INPUTHDF5S
-O ${OUTPUT}/bins_filtered.interval_list
Description
I've been running the gCNV pipeline as per this article on WES samples and have noticed that in some of my runs all of the Y chromosome contigs are being removed. This then messes with sex estimation during ploidy determination which further messes up the cnv calls on sex chromosomes.
Correct me if I'm wrong, but it seems that the low count filter ie "intervals with a count < 10 in > 50.0% of samples fail" will remove the Y chromosome from any batch of samples where more than half of them are female. Pushing the percentage up (e.g. 55%, 60% etc.) to where it catches up with the percentage of samples that are female can remove this problem, but it will also change the interval filtering parameters for all other contigs.
It seems that there should be a special consideration for sex chromosomes, for example stating "--allosomal-contig Y" like when using DetermineGermlineContigPloidy, or an always keep intervals option, like the -XL flag just in reverse.
The text was updated successfully, but these errors were encountered: