-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
targeted seq #10
Comments
Hi there, We have not tested dv-trio with targeted sequencing data, but I believe it should still work. However, this cannot be guaranteed. As long as input files formats are followed, I think it should work. Please let us know what you find if you decide to continue and try your targeted sequencing data. Thanks. |
Dear Eddie,
Many thanks for your email. Actually our samples were aligned to GRCh37 so
I guess the code will not work.
What is the estimated running time of a trio sequenced through WES?
Thank you very much
All the best,
Mona
…On Sun, Nov 26, 2023 at 7:34 AM Eddie Ip ***@***.***> wrote:
Hi there,
We have not tested dv-trio with targeted sequencing data, but I believe it
should still work.
However, this cannot be guaranteed. As long as input files formats are
followed, I think it should work.
Please let us know what you find if you decide to continue and try your
targeted sequencing data.
Thanks.
Eddie
—
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A73WRQSBQULS33XUMVPX6Y3YGLIGLAVCNFSM6AAAAAA7XJ4QAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGU4DKMZRGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Mona, We have a computation time comparison, which includes using exome data in our paper: Eddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou under Supplementary Table S1: Computation time and memory usage comparison of alternative trio calling approaches. Thanks |
Hi Eddie,
read it now. Many thanks
All the best,
Mona
…On Sun, Nov 26, 2023 at 10:43 AM Eddie Ip ***@***.***> wrote:
Hi Mona,
We have a computation time comparison, which includes using exome data in
our paper:
Eddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou
dv-trio: a family-based variant calling pipeline using DeepVariant
Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3549–3551,
https://doi.org/10.1093/bioinformatics/btaa116
under Supplementary Table S1: Computation time and memory usage comparison
of alternative trio calling approaches.
Thanks
Eddie
—
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A73WRQUIXRYJCB6LFM2YJ2TYGL6MHAVCNFSM6AAAAAA7XJ4QAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWG4ZDINRUGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Eddie,
Apologies for all these emails but I started running the dv-trio code on
the giab trio (downloaded the WES bam files you had included in the Readme
section) and the code is still running. According to your paper, it should
take less than 4 hrs to run. It completed the DeepVariant step of the first
2 bam files pretty quickly but is still running the 3rd bam file (since
12pm yesterday). I am using 32 CPUs and have enough storage.Any idea what
could be the problem?
Cannot wait to apply your code to my samples
Many thanks
All the best,
Mona
…On Sun, Nov 26, 2023 at 11:01 AM mona allouba ***@***.***> wrote:
Hi Eddie,
read it now. Many thanks
All the best,
Mona
On Sun, Nov 26, 2023 at 10:43 AM Eddie Ip ***@***.***>
wrote:
> Hi Mona,
>
> We have a computation time comparison, which includes using exome data in
> our paper:
>
> Eddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou
> dv-trio: a family-based variant calling pipeline using DeepVariant
> Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3549–3551,
> https://doi.org/10.1093/bioinformatics/btaa116
>
> under Supplementary Table S1: Computation time and memory usage
> comparison of alternative trio calling approaches.
>
> Thanks
> Eddie
>
> —
> Reply to this email directly, view it on GitHub
> <#10 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/A73WRQUIXRYJCB6LFM2YJ2TYGL6MHAVCNFSM6AAAAAA7XJ4QAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWG4ZDINRUGE>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Hi Mona, I cannot think of a reason for one sample to be still doing the DeepVariant step. If you kill the process, without killing the instance, maybe you can see the error file and work out where within the DeepVariant step the 3rd bam is stuck in. That's the only thing I can think of at the moment. |
Many thanks for your reply. I have tried running the code again but I got
these errors:
Done downloading model
ن نوف 27 17:04:22 EET 2023 - DOING HG002 now from
/mnt/my_work/Mona/giab_trio/HG002.GRCh38.60x.1.bam...
ن نوف 27 17:04:22 EET 2023 - Running DeepVariant MAKE EXAMPLES for HG002...
ن نوف 27 17:04:26 EET 2023 - Running DeepVariant CALL VARIANTS for HG002...
ن نوف 27 17:04:30 EET 2023 - Running DeepVariant POSTPROCESS VARIANTS for
HG002...
ن نوف 27 17:04:33 EET 2023 - DeepVariant run completed for HG002
ن نوف 27 17:04:33 EET 2023 - Trio co_calling started.
input file : /mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/trio.txt
child :
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG002/output/HG002.output.g.vcf.gz
father :
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG003/output/HG003.output.g.vcf.gz
mother :
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG004/output/HG004.output.g.vcf.gz
gzip:
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG002/output/HG002.output.g.vcf.gz:
No such file or directory
gzip:
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG003/output/HG003.output.g.vcf.gz:
No such file or directory
gzip:
/mnt/my_work/Mona/giab_trio/giab_trio_final/deepvariant/HG004/output/HG004.output.g.vcf.gz:
No such file or directory
Using GATK jar
/mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false
-Dsamjdk.use_async_io_write_samtools=true
-Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx12g
-Djava.io.tmpdir=/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp
-jar /mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar
CombineGVCFs -R /mnt/my_work/reference/hg38.fa --variant
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/child.output.cvd.g.vcf.gz
--variant
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/father.output.cvd.g.vcf.gz
--variant
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/mother.output.cvd.g.vcf.gz
-O
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/merge_gvcf.g.vcf.gz
17:04:36.940 INFO NativeLibraryLoader - Loading libgkl_compression.so from
jar:file:/mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 27, 2023 5:04:38 PM
shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials
runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:04:38.667 INFO CombineGVCFs -
------------------------------------------------------------
17:04:38.668 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK)
v4.1.3.0
17:04:38.668 INFO CombineGVCFs - For support and documentation go to
https://software.broadinstitute.org/gatk/
17:04:38.668 INFO CombineGVCFs - Executing as ***@***.***@
agenomics.ahc-research.net on Linux v5.4.0-150-generic amd64
17:04:38.668 INFO CombineGVCFs - Java runtime: Java HotSpot(TM) 64-Bit
Server VM v17.0.6+9-LTS-190
17:04:38.668 INFO CombineGVCFs - Start Date/Time: November 27, 2023 at
5:04:36 PM EET
17:04:38.668 INFO CombineGVCFs -
------------------------------------------------------------
17:04:38.668 INFO CombineGVCFs -
------------------------------------------------------------
17:04:38.669 INFO CombineGVCFs - HTSJDK Version: 2.20.1
17:04:38.669 INFO CombineGVCFs - Picard Version: 2.20.5
17:04:38.669 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:04:38.669 INFO CombineGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:04:38.669 INFO CombineGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:04:38.669 INFO CombineGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:04:38.669 INFO CombineGVCFs - Deflater: IntelDeflater
17:04:38.669 INFO CombineGVCFs - Inflater: IntelInflater
17:04:38.669 INFO CombineGVCFs - GCS max retries/reopens: 20
17:04:38.669 INFO CombineGVCFs - Requester pays: disabled
17:04:38.669 INFO CombineGVCFs - Initializing engine
17:04:38.823 INFO CombineGVCFs - Shutting down engine
[November 27, 2023 at 5:04:38 PM EET]
org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed
time: 0.03 minutes.
Runtime.totalMemory()=201326592
***********************************************************************
A USER ERROR has occurred: Cannot read
file:///mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/child.output.cvd.g.vcf.gz
because no suitable codecs found
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options
'-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
tbx_index_build failed:
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/merge_gvcf.g.vcf.gz
Using GATK jar
/mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false
-Dsamjdk.use_async_io_write_samtools=true
-Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx12g
-Djava.io.tmpdir=/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp
-jar /mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar
GenotypeGVCFs -R /mnt/my_work/reference/hg38.fa -V
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/merge_gvcf.g.vcf.gz
-D
/mnt/my_work/downloads/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
-O
/mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/trio_co_called.vcf.gz
17:04:40.666 INFO NativeLibraryLoader - Loading libgkl_compression.so from
jar:file:/mnt/my_work/tools/dv-trio/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 27, 2023 5:04:42 PM
shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials
runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:04:42.294 INFO GenotypeGVCFs -
------------------------------------------------------------
17:04:42.295 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK)
v4.1.3.0
17:04:42.295 INFO GenotypeGVCFs - For support and documentation go to
https://software.broadinstitute.org/gatk/
17:04:42.295 INFO GenotypeGVCFs - Executing as ***@***.***
@agenomics.ahc-research.net on Linux v5.4.0-150-generic amd64
17:04:42.295 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit
Server VM v17.0.6+9-LTS-190
17:04:42.295 INFO GenotypeGVCFs - Start Date/Time: November 27, 2023 at
5:04:40 PM EET
17:04:42.295 INFO GenotypeGVCFs -
------------------------------------------------------------
17:04:42.295 INFO GenotypeGVCFs -
------------------------------------------------------------
17:04:42.296 INFO GenotypeGVCFs - HTSJDK Version: 2.20.1
17:04:42.296 INFO GenotypeGVCFs - Picard Version: 2.20.5
17:04:42.296 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:04:42.296 INFO GenotypeGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:04:42.296 INFO GenotypeGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:04:42.296 INFO GenotypeGVCFs - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:04:42.296 INFO GenotypeGVCFs - Deflater: IntelDeflater
17:04:42.296 INFO GenotypeGVCFs - Inflater: IntelInflater
17:04:42.296 INFO GenotypeGVCFs - GCS max retries/reopens: 20
17:04:42.297 INFO GenotypeGVCFs - Requester pays: disabled
17:04:42.297 INFO GenotypeGVCFs - Initializing engine
17:04:42.478 INFO FeatureManager - Using codec VCFCodec to read file
file:///mnt/my_work/downloads/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
17:04:42.817 INFO GenotypeGVCFs - Shutting down engine
[November 27, 2023 at 5:04:42 PM EET]
org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed
time: 0.04 minutes.
Runtime.totalMemory()=243269632
***********************************************************************
A USER ERROR has occurred: Couldn't read file
file:///mnt/my_work/Mona/giab_trio/giab_trio_final/co_calling/temp/merge_gvcf.g.vcf.gz.
Error was: It doesn't exist.
Please note that I had downloaded the bam files for the giab trio (HG002,
HG003, HG004) using the same link that you had provided in github and I had
created a txt file with the correct path for these bam files (same template
as the one you provided)
Any idea what went wrong?
Thank you very much
All the best,
Mona
…On Mon, Nov 27, 2023 at 4:15 PM Eddie Ip ***@***.***> wrote:
Hi Mona,
I cannot think of a reason for one sample to be still doing the
DeepVariant step. If you kill the process, without killing the instance,
maybe you can see the error file and work out where within the DeepVariant
step the 3rd bam is stuck in.
That's the only thing I can think of at the moment.
Eddie
—
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A73WRQUBJXEJ347V4TMVKCTYGSOB3AVCNFSM6AAAAAA7XJ4QAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXHEYTOMZZGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi Mona, It looks like no output files were produced from the deepvariant runs, as noted by the error messages of "no such file or directory" when looking for the output.g.vcf.gz files. I'm actually not in the office at this moment, so cannot really help in checking. Is this run log from when the process was stuck for one of the bams? I assume you may already have closed the instance, so there is no way to check if any output files were created for those 2 bams that you mentioned ran successfully. If you have not close the instance could you check, or run the trio again? I might be able to help more when I come back into the office, but that would be some time from now. Eddie |
Thank you very much for developing this useful code. Does it work on targeted sequencing data too?
The text was updated successfully, but these errors were encountered: