Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update VRS annotation batch script for v4 exomes #382

Merged
merged 26 commits into from
Aug 24, 2023
Merged

Conversation

KoalaQin
Copy link
Contributor

@KoalaQin KoalaQin commented Jul 22, 2023

There are a few changes implemented to make the run:

  1. add function for the output path of VRS in v4 exomes;
  2. Delete other v3 input and output paths in the dictionary;
  3. put retry in init_job_with_gcloud to avoid Hail Batch transient errors, suggested by Dan King;
  4. add bgzip in --run-vrs step so import_vcf could be run faster;
    5. move import_vcf to --annotate-original step because this is faster to run in Query-on-Spark (QoS);
    6. remove checkpoint since we know where the final folder of the VRS annotations
  5. remove delete_temps because files got removed even if the jobs weren't all successful;
  6. change exclude duplicate so it applies to only one input file;
  7. use gnomad-tmp-4day in working-bucket, so the intermediate files can be automatically deleted);
    10. Clarify in the docstring that the script has to be run once on QoB, once on QoS.

update 2023-08-16:

  1. add test argument to test on 200 partitions;
  2. put annotate-original back to original design, now with hailctl batch submit we can run both run-vrs and annotate-original in one go

gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch_v4.py Outdated Show resolved Hide resolved
@KoalaQin KoalaQin requested a review from klaricch July 31, 2023 18:12
@KoalaQin
Copy link
Contributor Author

@klaricch I changed the code, to add a test argument to test 200 partitions of an input, and put annotate-original back to what it was, and now with hailctl batch submit we can run both run-vrs and annotate-original in one go.

@KoalaQin
Copy link
Contributor Author

@klaricch annotations.py is identical to main now, I couldn't merge main into this branch because it's been there too long.

gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/resources/annotations.py Show resolved Hide resolved
@KoalaQin
Copy link
Contributor Author

@klaricch I made changes to get rid of vrs-io-finals, but you can't test with hailctl batch submit yet (because the docker image needs to be updated with the annotations.py). I ran a test with this and it worked:

python3 /Users/heqin/PycharmProjects/gnomad_qc/gnomad_qc/v4/annotations/vrs_annotation_batch.py \
--billing-project gnomad-annot \
--working-bucket gnomad-tmp-4day \
--image us-central1-docker.pkg.dev/broad-mpg-gnomad/images/vrs084 \
--version test_v4.0_exomes \
--prefix v4 \
--header-path gs://gnomad/v4.0/annotations/exomes/vrs-header-fix.txt \
--run-vrs \
--annotate-original \
--overwrite \
--backend-mode batch \
--test

@KoalaQin KoalaQin requested a review from klaricch August 23, 2023 15:18
gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/resources/annotations.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
gnomad_qc/v4/annotations/vrs_annotation_batch.py Outdated Show resolved Hide resolved
@KoalaQin KoalaQin requested a review from klaricch August 24, 2023 14:18
@KoalaQin KoalaQin requested a review from klaricch August 24, 2023 20:00
Copy link
Contributor

@klaricch klaricch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@KoalaQin KoalaQin merged commit aa1155a into main Aug 24, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants