Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v4 variant QC annotation HT creation and true positive VCF export #402

Merged
merged 12 commits into from
Aug 31, 2023

Conversation

jkgoodrich
Copy link
Contributor

No description provided.

…to jg/v4_random_forest_annotation_ht

# Conflicts:
#	gnomad_qc/v4/annotations/generate_variant_qc_annotations.py
Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than my small comments seems good! Might take another look at it tomorrow

…to jg/v4_random_forest_annotation_ht

# Conflicts:
#	gnomad_qc/v4/annotations/generate_variant_qc_annotations.py
Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments!

positive variants.
:return: Dictionary of 'raw' and 'adj' true positive variant Tables.
"""
if not transmitted_singletons and not sibling_singletons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptual question - why does this have to be the case? Why can't the true positives not include singletons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding to clarify what we are making. Want to make two files (there is also an option for a 3rd, but we probably won't test it, so no reason to make it).

  • VCF of just transmitted singletons: These are determined by looking at the trios and identifying variants that are found in one parent and the child, and those two individuals are the only two in the callset with the variant.
  • VCF of transmitted singletons (defined above) and sibling singletons: sibling singletons are determined by looking at two individuals that are siblings and identifying variants that are found in both siblings, and these two individuals are the only individuals in the callset with that variant.
  • We can also export just sibling singletons, but it's not really worth it.

We use these as true positives in the variant QC training so that we can see what the features we use in the model look like on rare variants that we have high confidence in being real.

This if statement is just throwing an error because if you don't pick one of these you are just asking for an empty VCF

) -> PipelineResourceCollection:
"""
Get PipelineResourceCollection for all resources needed in the variant QC annotation pipeline.

:param test: Whether to gather all resources for testing.
:param overwrite: Whether to overwrite resources if they exist.
:param true_positive_type: Type of true positive variants to use for true positive
VCF path resource. Default is None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VCF path resource. Default is None.
VCF path resource when exporting. Default is None.

A bit unclear abt code - does it not export the True Positive VCFs if this isn't passed? If so, may want to mention in params. If I'm just misreading it though, then nvm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_variant_qc_annotation_resources is only getting the resources and not actually performing the export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, makes sense now

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last note!

) -> PipelineResourceCollection:
"""
Get PipelineResourceCollection for all resources needed in the variant QC annotation pipeline.

:param test: Whether to gather all resources for testing.
:param overwrite: Whether to overwrite resources if they exist.
:param true_positive_type: Type of true positive variants to use for true positive
VCF path resource. Default is None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, makes sense now

Comment on lines +760 to +767
tp_vcf_args.add_argument(
"--export-true-positive-vcfs",
help=(
"Exports true positive variants (--transmitted-singletons and/or"
" --sibling-singletons) to VCF files."
),
action="store_true",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a use case where either --transmitted-singletons and/or --sibling-singletons are passed but not --export-true-positives-vcfs ? Do they, or the PipelineStepResourceCollection named 'export_true_positive_vcfs', have any other function in the code or are read in elsewhere ?

If there isn't, you could just choose to export if either true positive types are passed - it would cut down on arguments but might be less user-friendly when passing argument.

Think the only code change would be changing line #668 to if true_positive_type: , but only if this is the case

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything I said has been addressed, brought up my last question at our Variant QC meeting, all good! I think my side is still showing merge conflicts though, but i'm not sure if that's just talking abt my local version of the repo or not

…to jg/v4_random_forest_annotation_ht

# Conflicts:
#	gnomad_qc/v4/annotations/generate_variant_qc_annotations.py
@jkgoodrich jkgoodrich merged commit 0c52cf4 into main Aug 31, 2023
1 check passed
@jkgoodrich jkgoodrich deleted the jg/v4_random_forest_annotation_ht branch August 31, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants