From 623f4c14cd1719dd6f8c0d49625d53d857b2d32d Mon Sep 17 00:00:00 2001 From: Rajat Roy Date: Fri, 19 Jul 2024 10:49:58 -0700 Subject: [PATCH] Deploy website - based on ff5adbce826dead4f9c2ba549b0ca82f41646f5f --- .../canonical-transcripts/index.html | 10 ++--- .../gene-fusions/index.html | 10 ++--- .../transcript-consequence-impacts/index.html | 10 ++--- .../core-functionality/variant-ids/index.html | 10 ++--- .../1000Genomes-snv-json/index.html | 10 ++--- .../1000Genomes-sv-json/index.html | 10 ++--- 3.22/data-sources/1000Genomes/index.html | 10 ++--- .../amino-acid-conservation-json/index.html | 10 ++--- .../amino-acid-conservation/index.html | 10 ++--- 3.22/data-sources/cancer-hotspots/index.html | 10 ++--- .../clingen-dosage-json/index.html | 10 ++--- .../clingen-gene-validity-json/index.html | 10 ++--- 3.22/data-sources/clingen-json/index.html | 10 ++--- 3.22/data-sources/clingen/index.html | 10 ++--- 3.22/data-sources/clinvar-json/index.html | 10 ++--- 3.22/data-sources/clinvar/index.html | 10 ++--- .../cosmic-cancer-gene-census/index.html | 10 ++--- .../cosmic-gene-fusion-json/index.html | 10 ++--- 3.22/data-sources/cosmic-json/index.html | 10 ++--- 3.22/data-sources/cosmic/index.html | 10 ++--- 3.22/data-sources/dann-json/index.html | 10 ++--- 3.22/data-sources/dann/index.html | 10 ++--- 3.22/data-sources/dbsnp-json/index.html | 10 ++--- 3.22/data-sources/dbsnp/index.html | 10 ++--- 3.22/data-sources/decipher-json/index.html | 10 ++--- 3.22/data-sources/decipher/index.html | 10 ++--- .../fusioncatcher-json/index.html | 10 ++--- 3.22/data-sources/fusioncatcher/index.html | 10 ++--- 3.22/data-sources/gerp-json/index.html | 10 ++--- 3.22/data-sources/gerp/index.html | 10 ++--- 3.22/data-sources/gme-json/index.html | 10 ++--- 3.22/data-sources/gme/index.html | 10 ++--- 3.22/data-sources/gnomad-lof-json/index.html | 10 ++--- .../gnomad-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- .../index.html | 10 ++--- 3.22/data-sources/gnomad/index.html | 10 ++--- .../data-sources/mito-heteroplasmy/index.html | 10 ++--- .../mitomap-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- 3.22/data-sources/mitomap/index.html | 10 ++--- 3.22/data-sources/omim-json/index.html | 10 ++--- 3.22/data-sources/omim/index.html | 10 ++--- 3.22/data-sources/phylop-json/index.html | 10 ++--- 3.22/data-sources/phylop/index.html | 10 ++--- 3.22/data-sources/primate-ai-json/index.html | 10 ++--- 3.22/data-sources/primate-ai/index.html | 10 ++--- 3.22/data-sources/revel-json/index.html | 10 ++--- 3.22/data-sources/revel/index.html | 10 ++--- 3.22/data-sources/splice-ai-json/index.html | 10 ++--- 3.22/data-sources/splice-ai/index.html | 10 ++--- 3.22/data-sources/topmed-json/index.html | 10 ++--- 3.22/data-sources/topmed/index.html | 10 ++--- .../custom-annotations/index.html | 10 ++--- .../index.html | 10 ++--- 3.22/index.html | 10 ++--- 3.22/introduction/dependencies/index.html | 10 ++--- 3.22/introduction/getting-started/index.html | 10 ++--- 3.22/introduction/parsing-json/index.html | 10 ++--- 3.22/utilities/jasix/index.html | 10 ++--- 3.22/utilities/sautils/index.html | 10 ++--- .../canonical-transcripts/index.html | 10 ++--- .../gene-fusions/index.html | 10 ++--- .../transcript-consequence-impacts/index.html | 10 ++--- .../core-functionality/variant-ids/index.html | 10 ++--- .../1000Genomes-snv-json/index.html | 10 ++--- .../1000Genomes-sv-json/index.html | 10 ++--- 3.23/data-sources/1000Genomes/index.html | 10 ++--- .../amino-acid-conservation-json/index.html | 10 ++--- .../amino-acid-conservation/index.html | 10 ++--- 3.23/data-sources/cancer-hotspots/index.html | 10 ++--- .../clingen-dosage-json/index.html | 10 ++--- .../clingen-gene-validity-json/index.html | 10 ++--- 3.23/data-sources/clingen-json/index.html | 10 ++--- 3.23/data-sources/clingen/index.html | 10 ++--- 3.23/data-sources/clinvar-json/index.html | 10 ++--- 3.23/data-sources/clinvar/index.html | 10 ++--- .../cosmic-cancer-gene-census/index.html | 10 ++--- .../cosmic-gene-fusion-json/index.html | 10 ++--- 3.23/data-sources/cosmic-json/index.html | 10 ++--- 3.23/data-sources/cosmic/index.html | 10 ++--- 3.23/data-sources/dann-json/index.html | 10 ++--- 3.23/data-sources/dann/index.html | 10 ++--- 3.23/data-sources/dbsnp-json/index.html | 10 ++--- 3.23/data-sources/dbsnp/index.html | 10 ++--- 3.23/data-sources/decipher-json/index.html | 10 ++--- 3.23/data-sources/decipher/index.html | 10 ++--- .../fusioncatcher-json/index.html | 10 ++--- 3.23/data-sources/fusioncatcher/index.html | 10 ++--- 3.23/data-sources/gerp-json/index.html | 10 ++--- 3.23/data-sources/gerp/index.html | 10 ++--- 3.23/data-sources/gme-json/index.html | 10 ++--- 3.23/data-sources/gme/index.html | 10 ++--- 3.23/data-sources/gnomad-lof-json/index.html | 10 ++--- .../gnomad-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- .../index.html | 10 ++--- 3.23/data-sources/gnomad/index.html | 10 ++--- .../data-sources/mito-heteroplasmy/index.html | 10 ++--- .../mitomap-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- 3.23/data-sources/mitomap/index.html | 10 ++--- 3.23/data-sources/omim-json/index.html | 10 ++--- 3.23/data-sources/omim/index.html | 10 ++--- 3.23/data-sources/phylop-json/index.html | 10 ++--- 3.23/data-sources/phylop/index.html | 10 ++--- .../phylopprimate-json/index.html | 10 ++--- 3.23/data-sources/primate-ai-json/index.html | 10 ++--- 3.23/data-sources/primate-ai/index.html | 10 ++--- 3.23/data-sources/revel-json/index.html | 10 ++--- 3.23/data-sources/revel/index.html | 10 ++--- 3.23/data-sources/splice-ai-json/index.html | 10 ++--- 3.23/data-sources/splice-ai/index.html | 10 ++--- 3.23/data-sources/topmed-json/index.html | 10 ++--- 3.23/data-sources/topmed/index.html | 10 ++--- .../custom-annotations/index.html | 10 ++--- .../index.html | 10 ++--- 3.23/index.html | 10 ++--- 3.23/introduction/dependencies/index.html | 10 ++--- 3.23/introduction/getting-started/index.html | 10 ++--- 3.23/introduction/licensedContent/index.html | 10 ++--- 3.23/introduction/parsing-json/index.html | 10 ++--- 3.23/utilities/jasix/index.html | 10 ++--- 3.23/utilities/sautils/index.html | 10 ++--- .../canonical-transcripts/index.html | 18 +++++++++ .../gene-fusions/index.html | 19 +++++++++ .../iscn-notation/index.html | 24 +++++++++++ .../junction-preserving/index.html | 18 +++++++++ .../transcript-consequence-impacts/index.html | 19 +++++++++ .../core-functionality/variant-ids/index.html | 18 +++++++++ .../1000Genomes-snv-json/index.html | 18 +++++++++ .../1000Genomes-sv-json/index.html | 18 +++++++++ 3.24/data-sources/1000Genomes/index.html | 20 ++++++++++ .../amino-acid-conservation-json/index.html | 18 +++++++++ .../amino-acid-conservation/index.html | 19 +++++++++ 3.24/data-sources/cancer-hotspots/index.html | 19 +++++++++ .../clingen-dosage-json/index.html | 18 +++++++++ .../clingen-gene-validity-json/index.html | 18 +++++++++ 3.24/data-sources/clingen-json/index.html | 18 +++++++++ 3.24/data-sources/clingen/index.html | 18 +++++++++ 3.24/data-sources/clinvar-json/index.html | 18 +++++++++ .../clinvar-preview-json/index.html | 18 +++++++++ 3.24/data-sources/clinvar-preview/index.html | 23 +++++++++++ 3.24/data-sources/clinvar/index.html | 21 ++++++++++ .../cosmic-cancer-gene-census/index.html | 18 +++++++++ .../cosmic-gene-fusion-json/index.html | 18 +++++++++ 3.24/data-sources/cosmic-json/index.html | 18 +++++++++ 3.24/data-sources/cosmic/index.html | 28 +++++++++++++ 3.24/data-sources/dann-json/index.html | 18 +++++++++ 3.24/data-sources/dann/index.html | 21 ++++++++++ 3.24/data-sources/dbsnp-json/index.html | 18 +++++++++ 3.24/data-sources/dbsnp/index.html | 18 +++++++++ 3.24/data-sources/decipher-json/index.html | 18 +++++++++ 3.24/data-sources/decipher/index.html | 19 +++++++++ .../fusioncatcher-json/index.html | 18 +++++++++ 3.24/data-sources/fusioncatcher/index.html | 18 +++++++++ 3.24/data-sources/gerp-json/index.html | 18 +++++++++ 3.24/data-sources/gerp/index.html | 20 ++++++++++ 3.24/data-sources/gme-json/index.html | 18 +++++++++ 3.24/data-sources/gme/index.html | 18 +++++++++ 3.24/data-sources/gnomad-lof-json/index.html | 18 +++++++++ .../gnomad-small-variants-json/index.html | 18 +++++++++ .../index.html | 19 +++++++++ .../index.html | 18 +++++++++ 3.24/data-sources/gnomad/index.html | 33 +++++++++++++++ .../gnomad4.0-lof-json/index.html | 18 +++++++++ .../gnomad4.0-small-variants-json/index.html | 18 +++++++++ .../index.html | 18 +++++++++ .../data-sources/mito-heteroplasmy/index.html | 18 +++++++++ .../mitomap-small-variants-json/index.html | 18 +++++++++ .../index.html | 18 +++++++++ 3.24/data-sources/mitomap/index.html | 18 +++++++++ 3.24/data-sources/omim-json/index.html | 18 +++++++++ 3.24/data-sources/omim/index.html | 23 +++++++++++ 3.24/data-sources/phylop-json/index.html | 18 +++++++++ 3.24/data-sources/phylop/index.html | 21 ++++++++++ .../phylopprimate-json/index.html | 18 +++++++++ 3.24/data-sources/primate-ai-json/index.html | 18 +++++++++ 3.24/data-sources/primate-ai/index.html | 24 +++++++++++ 3.24/data-sources/revel-json/index.html | 18 +++++++++ 3.24/data-sources/revel/index.html | 18 +++++++++ 3.24/data-sources/splice-ai-json/index.html | 18 +++++++++ 3.24/data-sources/splice-ai/index.html | 18 +++++++++ 3.24/data-sources/topmed-json/index.html | 18 +++++++++ 3.24/data-sources/topmed/index.html | 18 +++++++++ .../custom-annotations/index.html | 40 +++++++++++++++++++ .../index.html | 18 +++++++++ .../index.html | 18 +++++++++ .../Annotator-vs-data-update/index.html | 18 +++++++++ 3.24/index.html | 21 ++++++++++ 3.24/introduction/dependencies/index.html | 18 +++++++++ 3.24/introduction/getting-started/index.html | 18 +++++++++ 3.24/introduction/licensedContent/index.html | 25 ++++++++++++ 3.24/introduction/parsing-json/index.html | 18 +++++++++ 3.24/utilities/jasix/index.html | 18 +++++++++ 3.24/utilities/sautils/index.html | 22 ++++++++++ 404.html | 10 ++--- assets/js/017faa10.8264ca5b.js | 1 + assets/js/06f315d1.0371a111.js | 1 + assets/js/0fef68c1.a05cae8e.js | 1 + assets/js/114fee77.7efecee6.js | 1 + assets/js/21c89dc6.34478727.js | 1 + assets/js/23eb1a83.5f3133f6.js | 1 + assets/js/25bf377f.1ae12a20.js | 1 + assets/js/2b7f32d3.c2773f1f.js | 1 + assets/js/31f960e2.fcd268df.js | 1 + assets/js/363a8231.a2b5a59a.js | 1 + assets/js/36d9d5eb.3e884217.js | 1 + assets/js/37f5014d.e73f20ff.js | 1 + assets/js/3992ad3e.0618eec0.js | 1 + assets/js/42af2c4d.5eabc270.js | 1 + assets/js/4523c0b8.b6e19eee.js | 1 + assets/js/463e69e4.1615fe13.js | 1 - assets/js/463e69e4.61efeb17.js | 1 + assets/js/4b1283a1.3ef53153.js | 1 + assets/js/4b2748e1.b6526a03.js | 1 + assets/js/4b69b274.89bfa8ee.js | 1 - assets/js/4b69b274.dfa70220.js | 1 + assets/js/4f2cf309.32706599.js | 1 + assets/js/51620725.673ad6b6.js | 1 + assets/js/51833f15.28e8a08b.js | 1 + assets/js/51bfff8d.5473d8f7.js | 1 + assets/js/51f80da3.b8258f22.js | 1 + assets/js/53062ee1.3f65d126.js | 1 + assets/js/53b9567b.f283f1b9.js | 1 + assets/js/5c18c143.7f61c089.js | 1 + assets/js/64bd7e9e.3c740e86.js | 1 + assets/js/65232248.8ac53226.js | 1 + assets/js/65781eeb.2f6b3880.js | 1 + assets/js/669469dc.451ddf08.js | 1 + assets/js/67233f3f.7ade2c7c.js | 1 + assets/js/68ae1648.dd12f9da.js | 1 + assets/js/6bb3eb16.e2fa8230.js | 1 + assets/js/6da9a512.2a69647b.js | 1 + assets/js/70f7faf9.a369deca.js | 1 + assets/js/730b3355.53e6d777.js | 1 + assets/js/74830f3d.f9a8ae91.js | 1 + assets/js/75a3a2eb.00f2bb5f.js | 1 - assets/js/75a3a2eb.258736c2.js | 1 + assets/js/76a0dc22.1402df31.js | 1 + assets/js/78bb3c84.7fa47463.js | 1 + assets/js/79308b2f.fd9af8d3.js | 1 + assets/js/83fc027c.ab43b5a9.js | 1 + assets/js/86fcde84.0069d21b.js | 1 + assets/js/880ef044.14db7f9c.js | 1 + assets/js/88990ce9.b85dd7dd.js | 1 + assets/js/8eb3126b.8d6691e6.js | 1 + assets/js/91971be7.24ae75ad.js | 1 + assets/js/935f2afb.e82442de.js | 1 + assets/js/935f2afb.e94232a1.js | 1 - assets/js/94d6913f.a774a0ab.js | 1 + assets/js/987b70e8.239750d6.js | 1 + assets/js/a10271fe.ffa3312f.js | 1 + assets/js/a5e136a1.4b8c7497.js | 1 + assets/js/a5e136a1.c7e5c6d7.js | 1 - assets/js/a6af1fd8.ecc303b1.js | 1 + assets/js/a7b23c85.58554146.js | 1 + assets/js/af18058a.17789ebc.js | 1 + assets/js/af997954.478221ef.js | 1 + assets/js/b23ebcdf.c05b032d.js | 1 + assets/js/b4506888.2b5dcf08.js | 1 + assets/js/b5aea075.46242714.js | 1 + assets/js/b7dbb0d7.1351c5ab.js | 1 + assets/js/bd285e40.c8b8cb68.js | 1 + assets/js/bee64c31.fb97cfb1.js | 1 + assets/js/bf58d54d.0cf2023b.js | 1 + assets/js/c2a95928.3454d69f.js | 1 + assets/js/c4215fd2.0524e83e.js | 1 + assets/js/cc461efb.5c4d5431.js | 1 + assets/js/d065eee8.f11bbe97.js | 1 + assets/js/d284d299.f12d4533.js | 1 + assets/js/d8b0b6a4.3737c73e.js | 1 + assets/js/d9f334cf.ccb601aa.js | 1 + assets/js/da6337b1.6846131b.js | 1 + assets/js/e8574da6.a78525f9.js | 1 + assets/js/e95cadfe.aa190e36.js | 1 - assets/js/e95cadfe.d1638d74.js | 1 + assets/js/ee4db9b8.d1757fec.js | 1 + assets/js/ef4059aa.0328cea2.js | 1 + assets/js/ef4059aa.87eebb08.js | 1 - assets/js/ef5201ba.8389de3c.js | 1 + assets/js/f0d534fb.ee63e5e0.js | 1 + assets/js/f6bbc271.22ed2407.js | 1 + assets/js/f735f5cc.af7951fb.js | 1 + assets/js/ff654592.322d662d.js | 1 + assets/js/main.23b43cc6.js | 2 - assets/js/main.8997be17.js | 2 + ...CENSE.txt => main.8997be17.js.LICENSE.txt} | 0 assets/js/runtime~main.0e0be6e9.js | 1 + assets/js/runtime~main.fe5beb3e.js | 1 - blog/archive/index.html | 10 ++--- .../canonical-transcripts/index.html | 10 ++--- core-functionality/gene-fusions/index.html | 12 +++--- core-functionality/iscn-notation/index.html | 24 +++++++++++ .../junction-preserving/index.html | 10 ++--- .../transcript-consequence-impacts/index.html | 12 +++--- core-functionality/variant-ids/index.html | 10 ++--- data-sources/1000Genomes-snv-json/index.html | 10 ++--- data-sources/1000Genomes-sv-json/index.html | 10 ++--- data-sources/1000Genomes/index.html | 10 ++--- .../amino-acid-conservation-json/index.html | 10 ++--- .../amino-acid-conservation/index.html | 10 ++--- data-sources/cancer-hotspots/index.html | 10 ++--- data-sources/clingen-dosage-json/index.html | 10 ++--- .../clingen-gene-validity-json/index.html | 10 ++--- data-sources/clingen-json/index.html | 10 ++--- data-sources/clingen/index.html | 10 ++--- data-sources/clinvar-json/index.html | 10 ++--- data-sources/clinvar-preview-json/index.html | 10 ++--- data-sources/clinvar-preview/index.html | 10 ++--- data-sources/clinvar/index.html | 10 ++--- .../cosmic-cancer-gene-census/index.html | 10 ++--- .../cosmic-gene-fusion-json/index.html | 10 ++--- data-sources/cosmic-json/index.html | 10 ++--- data-sources/cosmic/index.html | 10 ++--- data-sources/dann-json/index.html | 10 ++--- data-sources/dann/index.html | 10 ++--- data-sources/dbsnp-json/index.html | 10 ++--- data-sources/dbsnp/index.html | 10 ++--- data-sources/decipher-json/index.html | 10 ++--- data-sources/decipher/index.html | 10 ++--- data-sources/fusioncatcher-json/index.html | 10 ++--- data-sources/fusioncatcher/index.html | 10 ++--- data-sources/gerp-json/index.html | 10 ++--- data-sources/gerp/index.html | 10 ++--- data-sources/gme-json/index.html | 10 ++--- data-sources/gme/index.html | 10 ++--- data-sources/gnomad-lof-json/index.html | 10 ++--- .../gnomad-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- .../index.html | 10 ++--- data-sources/gnomad/index.html | 10 ++--- data-sources/gnomad4.0-lof-json/index.html | 10 ++--- .../gnomad4.0-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- data-sources/mito-heteroplasmy/index.html | 10 ++--- .../mitomap-small-variants-json/index.html | 10 ++--- .../index.html | 10 ++--- data-sources/mitomap/index.html | 10 ++--- data-sources/omim-json/index.html | 10 ++--- data-sources/omim/index.html | 10 ++--- data-sources/phylop-json/index.html | 10 ++--- data-sources/phylop/index.html | 10 ++--- data-sources/phylopprimate-json/index.html | 10 ++--- data-sources/primate-ai-json/index.html | 10 ++--- data-sources/primate-ai/index.html | 10 ++--- data-sources/revel-json/index.html | 10 ++--- data-sources/revel/index.html | 10 ++--- data-sources/splice-ai-json/index.html | 10 ++--- data-sources/splice-ai/index.html | 10 ++--- data-sources/topmed-json/index.html | 10 ++--- data-sources/topmed/index.html | 10 ++--- file-formats/custom-annotations/index.html | 10 ++--- .../index.html | 10 ++--- .../index.html | 10 ++--- .../Annotator-vs-data-update/index.html | 10 ++--- index.html | 12 +++--- introduction/dependencies/index.html | 10 ++--- introduction/getting-started/index.html | 10 ++--- introduction/licensedContent/index.html | 10 ++--- introduction/parsing-json/index.html | 10 ++--- search/index.html | 10 ++--- sitemap.xml | 2 +- utilities/jasix/index.html | 10 ++--- utilities/sautils/index.html | 10 ++--- versions/index.html | 10 ++--- 366 files changed, 2505 insertions(+), 1009 deletions(-) create mode 100644 3.24/core-functionality/canonical-transcripts/index.html create mode 100644 3.24/core-functionality/gene-fusions/index.html create mode 100644 3.24/core-functionality/iscn-notation/index.html create mode 100644 3.24/core-functionality/junction-preserving/index.html create mode 100644 3.24/core-functionality/transcript-consequence-impacts/index.html create mode 100644 3.24/core-functionality/variant-ids/index.html create mode 100644 3.24/data-sources/1000Genomes-snv-json/index.html create mode 100644 3.24/data-sources/1000Genomes-sv-json/index.html create mode 100644 3.24/data-sources/1000Genomes/index.html create mode 100644 3.24/data-sources/amino-acid-conservation-json/index.html create mode 100644 3.24/data-sources/amino-acid-conservation/index.html create mode 100644 3.24/data-sources/cancer-hotspots/index.html create mode 100644 3.24/data-sources/clingen-dosage-json/index.html create mode 100644 3.24/data-sources/clingen-gene-validity-json/index.html create mode 100644 3.24/data-sources/clingen-json/index.html create mode 100644 3.24/data-sources/clingen/index.html create mode 100644 3.24/data-sources/clinvar-json/index.html create mode 100644 3.24/data-sources/clinvar-preview-json/index.html create mode 100644 3.24/data-sources/clinvar-preview/index.html create mode 100644 3.24/data-sources/clinvar/index.html create mode 100644 3.24/data-sources/cosmic-cancer-gene-census/index.html create mode 100644 3.24/data-sources/cosmic-gene-fusion-json/index.html create mode 100644 3.24/data-sources/cosmic-json/index.html create mode 100644 3.24/data-sources/cosmic/index.html create mode 100644 3.24/data-sources/dann-json/index.html create mode 100644 3.24/data-sources/dann/index.html create mode 100644 3.24/data-sources/dbsnp-json/index.html create mode 100644 3.24/data-sources/dbsnp/index.html create mode 100644 3.24/data-sources/decipher-json/index.html create mode 100644 3.24/data-sources/decipher/index.html create mode 100644 3.24/data-sources/fusioncatcher-json/index.html create mode 100644 3.24/data-sources/fusioncatcher/index.html create mode 100644 3.24/data-sources/gerp-json/index.html create mode 100644 3.24/data-sources/gerp/index.html create mode 100644 3.24/data-sources/gme-json/index.html create mode 100644 3.24/data-sources/gme/index.html create mode 100644 3.24/data-sources/gnomad-lof-json/index.html create mode 100644 3.24/data-sources/gnomad-small-variants-json/index.html create mode 100644 3.24/data-sources/gnomad-structural-variants-data_description/index.html create mode 100644 3.24/data-sources/gnomad-structural-variants-json/index.html create mode 100644 3.24/data-sources/gnomad/index.html create mode 100644 3.24/data-sources/gnomad4.0-lof-json/index.html create mode 100644 3.24/data-sources/gnomad4.0-small-variants-json/index.html create mode 100644 3.24/data-sources/gnomad40-structural-variants-json/index.html create mode 100644 3.24/data-sources/mito-heteroplasmy/index.html create mode 100644 3.24/data-sources/mitomap-small-variants-json/index.html create mode 100644 3.24/data-sources/mitomap-structural-variants-json/index.html create mode 100644 3.24/data-sources/mitomap/index.html create mode 100644 3.24/data-sources/omim-json/index.html create mode 100644 3.24/data-sources/omim/index.html create mode 100644 3.24/data-sources/phylop-json/index.html create mode 100644 3.24/data-sources/phylop/index.html create mode 100644 3.24/data-sources/phylopprimate-json/index.html create mode 100644 3.24/data-sources/primate-ai-json/index.html create mode 100644 3.24/data-sources/primate-ai/index.html create mode 100644 3.24/data-sources/revel-json/index.html create mode 100644 3.24/data-sources/revel/index.html create mode 100644 3.24/data-sources/splice-ai-json/index.html create mode 100644 3.24/data-sources/splice-ai/index.html create mode 100644 3.24/data-sources/topmed-json/index.html create mode 100644 3.24/data-sources/topmed/index.html create mode 100644 3.24/file-formats/custom-annotations/index.html create mode 100644 3.24/file-formats/illumina-annotator-json-file-format/index.html create mode 100644 3.24/file-formats/illumina-annotator-vcf-file-format/index.html create mode 100644 3.24/frequently-asked-questions/Annotator-vs-data-update/index.html create mode 100644 3.24/index.html create mode 100644 3.24/introduction/dependencies/index.html create mode 100644 3.24/introduction/getting-started/index.html create mode 100644 3.24/introduction/licensedContent/index.html create mode 100644 3.24/introduction/parsing-json/index.html create mode 100644 3.24/utilities/jasix/index.html create mode 100644 3.24/utilities/sautils/index.html create mode 100644 assets/js/017faa10.8264ca5b.js create mode 100644 assets/js/06f315d1.0371a111.js create mode 100644 assets/js/0fef68c1.a05cae8e.js create mode 100644 assets/js/114fee77.7efecee6.js create mode 100644 assets/js/21c89dc6.34478727.js create mode 100644 assets/js/23eb1a83.5f3133f6.js create mode 100644 assets/js/25bf377f.1ae12a20.js create mode 100644 assets/js/2b7f32d3.c2773f1f.js create mode 100644 assets/js/31f960e2.fcd268df.js create mode 100644 assets/js/363a8231.a2b5a59a.js create mode 100644 assets/js/36d9d5eb.3e884217.js create mode 100644 assets/js/37f5014d.e73f20ff.js create mode 100644 assets/js/3992ad3e.0618eec0.js create mode 100644 assets/js/42af2c4d.5eabc270.js create mode 100644 assets/js/4523c0b8.b6e19eee.js delete mode 100644 assets/js/463e69e4.1615fe13.js create mode 100644 assets/js/463e69e4.61efeb17.js create mode 100644 assets/js/4b1283a1.3ef53153.js create mode 100644 assets/js/4b2748e1.b6526a03.js delete mode 100644 assets/js/4b69b274.89bfa8ee.js create mode 100644 assets/js/4b69b274.dfa70220.js create mode 100644 assets/js/4f2cf309.32706599.js create mode 100644 assets/js/51620725.673ad6b6.js create mode 100644 assets/js/51833f15.28e8a08b.js create mode 100644 assets/js/51bfff8d.5473d8f7.js create mode 100644 assets/js/51f80da3.b8258f22.js create mode 100644 assets/js/53062ee1.3f65d126.js create mode 100644 assets/js/53b9567b.f283f1b9.js create mode 100644 assets/js/5c18c143.7f61c089.js create mode 100644 assets/js/64bd7e9e.3c740e86.js create mode 100644 assets/js/65232248.8ac53226.js create mode 100644 assets/js/65781eeb.2f6b3880.js create mode 100644 assets/js/669469dc.451ddf08.js create mode 100644 assets/js/67233f3f.7ade2c7c.js create mode 100644 assets/js/68ae1648.dd12f9da.js create mode 100644 assets/js/6bb3eb16.e2fa8230.js create mode 100644 assets/js/6da9a512.2a69647b.js create mode 100644 assets/js/70f7faf9.a369deca.js create mode 100644 assets/js/730b3355.53e6d777.js create mode 100644 assets/js/74830f3d.f9a8ae91.js delete mode 100644 assets/js/75a3a2eb.00f2bb5f.js create mode 100644 assets/js/75a3a2eb.258736c2.js create mode 100644 assets/js/76a0dc22.1402df31.js create mode 100644 assets/js/78bb3c84.7fa47463.js create mode 100644 assets/js/79308b2f.fd9af8d3.js create mode 100644 assets/js/83fc027c.ab43b5a9.js create mode 100644 assets/js/86fcde84.0069d21b.js create mode 100644 assets/js/880ef044.14db7f9c.js create mode 100644 assets/js/88990ce9.b85dd7dd.js create mode 100644 assets/js/8eb3126b.8d6691e6.js create mode 100644 assets/js/91971be7.24ae75ad.js create mode 100644 assets/js/935f2afb.e82442de.js delete mode 100644 assets/js/935f2afb.e94232a1.js create mode 100644 assets/js/94d6913f.a774a0ab.js create mode 100644 assets/js/987b70e8.239750d6.js create mode 100644 assets/js/a10271fe.ffa3312f.js create mode 100644 assets/js/a5e136a1.4b8c7497.js delete mode 100644 assets/js/a5e136a1.c7e5c6d7.js create mode 100644 assets/js/a6af1fd8.ecc303b1.js create mode 100644 assets/js/a7b23c85.58554146.js create mode 100644 assets/js/af18058a.17789ebc.js create mode 100644 assets/js/af997954.478221ef.js create mode 100644 assets/js/b23ebcdf.c05b032d.js create mode 100644 assets/js/b4506888.2b5dcf08.js create mode 100644 assets/js/b5aea075.46242714.js create mode 100644 assets/js/b7dbb0d7.1351c5ab.js create mode 100644 assets/js/bd285e40.c8b8cb68.js create mode 100644 assets/js/bee64c31.fb97cfb1.js create mode 100644 assets/js/bf58d54d.0cf2023b.js create mode 100644 assets/js/c2a95928.3454d69f.js create mode 100644 assets/js/c4215fd2.0524e83e.js create mode 100644 assets/js/cc461efb.5c4d5431.js create mode 100644 assets/js/d065eee8.f11bbe97.js create mode 100644 assets/js/d284d299.f12d4533.js create mode 100644 assets/js/d8b0b6a4.3737c73e.js create mode 100644 assets/js/d9f334cf.ccb601aa.js create mode 100644 assets/js/da6337b1.6846131b.js create mode 100644 assets/js/e8574da6.a78525f9.js delete mode 100644 assets/js/e95cadfe.aa190e36.js create mode 100644 assets/js/e95cadfe.d1638d74.js create mode 100644 assets/js/ee4db9b8.d1757fec.js create mode 100644 assets/js/ef4059aa.0328cea2.js delete mode 100644 assets/js/ef4059aa.87eebb08.js create mode 100644 assets/js/ef5201ba.8389de3c.js create mode 100644 assets/js/f0d534fb.ee63e5e0.js create mode 100644 assets/js/f6bbc271.22ed2407.js create mode 100644 assets/js/f735f5cc.af7951fb.js create mode 100644 assets/js/ff654592.322d662d.js delete mode 100644 assets/js/main.23b43cc6.js create mode 100644 assets/js/main.8997be17.js rename assets/js/{main.23b43cc6.js.LICENSE.txt => main.8997be17.js.LICENSE.txt} (100%) create mode 100644 assets/js/runtime~main.0e0be6e9.js delete mode 100644 assets/js/runtime~main.fe5beb3e.js create mode 100644 core-functionality/iscn-notation/index.html diff --git a/3.22/core-functionality/canonical-transcripts/index.html b/3.22/core-functionality/canonical-transcripts/index.html index 7ec4fcb4..f3f45089 100644 --- a/3.22/core-functionality/canonical-transcripts/index.html +++ b/3.22/core-functionality/canonical-transcripts/index.html @@ -6,13 +6,13 @@ Canonical Transcripts | IlluminaConnectedAnnotations - - + +
-
Skip to main content
Version: 3.22

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
- - +
Skip to main content
Version: 3.22

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
+ + \ No newline at end of file diff --git a/3.22/core-functionality/gene-fusions/index.html b/3.22/core-functionality/gene-fusions/index.html index 151bfe55..4f582201 100644 --- a/3.22/core-functionality/gene-fusions/index.html +++ b/3.22/core-functionality/gene-fusions/index.html @@ -6,14 +6,14 @@ Gene Fusion Detection | IlluminaConnectedAnnotations - - + +
-
Skip to main content
Version: 3.22

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 & FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 & FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. +

Version: 3.22

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 & FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 & FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. If only unidirectional gene fusions are desired, only these two fusions can be detected. If enable-bidirectional-fusions is enabled, all four cases can be identified.

Interpreting translocation breakends

At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the VCF 4.2 specification.

REFALTMeaning
st[p[piece extending to the right of p is joined after t
st]p]reverse comp piece extending left of p is joined after t
s]p]tpiece extending to the left of p is joined before t
s[p[treverse comp piece extending right of p is joined before t

Variant Types

Specifically we can identify gene fusions from the following structural variant types:

  • deletions (<DEL>)
  • tandem_duplications (<DUP:TANDEM>)
  • inversions (<INV>)
  • translocation breakpoints (AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[)

Criteria

The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:

  1. After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if enable-bidirectional-fusions is not enabled. They can have the same or different orientations if enable-bidirectional-fusions is set.
  2. Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)
  3. Both transcripts must belong to different genes
  4. Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)

ETV6/RUNX1 Example

ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment.

VCF

Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND
chr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND
chr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND
chr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND

When you put these calls together, the resulting genomic rearrangement looks something like this:

JSON Output

The annotation for the first variant in the VCF looks like this:

{
"chromosome": "chr12",
"position": 12026270,
"refAllele": "C",
"altAlleles": [
"[chr21:36420865[C"
],
"filters": [
"PASS"
],
"cytogeneticBand": "12p13.2",
"clingen": [
{
"chromosome": "12",
"begin": 173786,
"end": 34835837,
"variantType": "copy_number_gain",
"id": "nsv995956",
"clinicalInterpretation": "pathogenic",
"phenotypes": [
"Decreased calvarial ossification",
"Delayed gross motor development",
"Feeding difficulties",
"Frontal bossing",
"Morphological abnormality of the central nervous system",
"Patchy alopecia"
],
"phenotypeIds": [
"HP:0002007",
"HP:0002011",
"HP:0002194",
"HP:0002232",
"HP:0005474",
"HP:0011968",
"MedGen:C0232466",
"MedGen:C1862862",
"MedGen:CN001816",
"MedGen:CN001820",
"MedGen:CN001989",
"MedGen:CN004852"
],
"observedGains": 1,
"validated": true
}
],
"variants": [
{
"vid": "12-12026270-C-[chr21:36420865[C",
"chromosome": "chr12",
"begin": 12026270,
"end": 12026270,
"isStructuralVariant": true,
"refAllele": "C",
"altAllele": "[chr21:36420865[C",
"variantType": "translocation_breakend",
"cosmicGeneFusions": [
{
"id": "COSF2245",
"numSamples": 249,
"geneSymbols": [
"ETV6",
"RUNX1"
],
"hgvsr": "ENST00000396373.4(ETV6):r.1_1283::ENST00000300305.3(RUNX1):r.504_6222",
"histologies": [
{
"name": "acute lymphoblastic B cell leukaemia",
"numSamples": 169
},
{
"name": "acute lymphoblastic leukaemia",
"numSamples": 80
}
],
"sites": [
{
"name": "haematopoietic and lymphoid tissue",
"numSamples": 249
}
],
"pubMedIds": [
7761424,
7780150,
8609706,
8751464,
8982044,
9067587,
9207408,
9226156,
9628428,
10463610,
10774753,
11091202,
12621238,
12661004,
12750722,
15104290,
15642392,
24557455,
26925663
]
}
],
"fusionCatcher": [
{
"genes": {
"first": {
"hgnc": "ETV6",
"isOncogene": true
},
"second": {
"hgnc": "RUNX1",
"isOncogene": true
}
},
"somaticSources": [
"DepMap CCLE",
"Cancer Genome Project",
"ChimerKB 4.0",
"ChimerPub 4.0",
"ChimerSeq 4.0",
"Known",
"Mitelman DB",
"OncoKB",
"TICdb"
]
}
],
"transcripts": [
{
"transcript": "ENST00000396373.4",
"source": "Ensembl",
"bioType": "protein_coding",
"introns": "5/7",
"geneId": "ENSG00000139083",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"geneFusions": [
{
"transcript": "ENST00000437180.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000300305.3",
"bioType": "protein_coding",
"intron": 1,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000482318.1",
"bioType": "nonsense_mediated_decay",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000482318.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000486278.2",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000486278.2(RUNX1):r.?_-15+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000455571.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000455571.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000475045.2",
"bioType": "protein_coding",
"intron": 11,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000475045.2(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000416754.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000416754.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],
"isCanonical": true,
"proteinId": "ENSP00000379658.3"
},
{
"transcript": "NM_001987.4",
"source": "RefSeq",
"bioType": "protein_coding",
"introns": "5/7",
"geneId": "2120",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],
"isCanonical": true,
"proteinId": "NP_001978.1"
}
]
}
]
}
FieldTypeNotes
transcriptstringtranscript ID
bioTypestringdescriptions of the biotypes from Ensembl
exonintexon that contained fusion breakpoint
intronintintron that contained fusion breakpoint
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
hgvsrstringHGVS RNA nomenclature

Gene Fusion Data Sources

To provide more context to our gene fusions, we provide the following gene fusion data sources:

Consequences

When a gene fusion is identified, we add the following Sequence Ontology consequence:

              "consequence": [
"transcript_variant",
"gene_fusion"
],
  • If both transcripts have the same orientation, we label it as unidirectional_gene_fusion, if they have different orientations, we label it as bidirectional_gene_fusion
  • If both unidirectional and bidirectional ones are detected, we label it as gene_fusion.

Gene Fusions Section

The geneFusions section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ENST00000396373.4, there 7 other Ensembl transcripts that would produce a gene fusion. For NM_001987.4, there was only one transcript (NM_001754.4) that produce a gene fusion.

For each originating transcript, we report the following for each partner transcript:

  • transcript ID
  • gene ID
  • HGNC gene symbol
  • transcript bio type (e.g. protein_coding)
  • intron or exon number containing the breakpoint
  • HGVS RNA notation
  • gene fusion directionality
tip

Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see HGVS SVD-WG007).

          "geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],

The HGVS RNA notation above indicates that the gene fusion starts with NM_001754.4 (RUNX1) until CDS position 58 and continues with NM_001987.4 (ETV6). 1009+3367 indicates that the fusion occurred 3367 bp within intron 2.

- - + + \ No newline at end of file diff --git a/3.22/core-functionality/transcript-consequence-impacts/index.html b/3.22/core-functionality/transcript-consequence-impacts/index.html index dc8ae22b..47c23d9e 100644 --- a/3.22/core-functionality/transcript-consequence-impacts/index.html +++ b/3.22/core-functionality/transcript-consequence-impacts/index.html @@ -6,14 +6,14 @@ Transcript Consequence Impact | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. +

Version: 3.22

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. However, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided.

Example Transcript

The key impact for each transcript gives the impact rating for the consequence.

{
"variants": [
{
"vid": "1-1623412-T-C",
"chromosome": "1",
"begin": 1623412,
"end": 1623412,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.1623412T>C",
"transcripts": [
{
"transcript": "ENST00000479659.5",
"source": "Ensembl",
"bioType": "lncRNA",
"introns": "2/18",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"intron_variant",
"non_coding_transcript_variant"
],
"impact": "modifier",
"hgvsc": "ENST00000479659.5:n.288-19T>C"
},
{
"transcript": "ENST00000489635.5",
"source": "VEP",
"bioType": "mRNA",
"codons": "aTg/aCg",
"aminoAcids": "M/T",
"cdnaPos": "269",
"cdsPos": "134",
"exons": "3/20",
"proteinPos": "45",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"missense_variant"
],
"impact": "moderate",
"hgvsc": "ENST00000489635.5:c.134T>C",
"hgvsp": "ENSP00000426007.1:p.(Met45Thr)",
"proteinId": "ENSP00000426007.1"
}
]
}
]
}
- - + + \ No newline at end of file diff --git a/3.22/core-functionality/variant-ids/index.html b/3.22/core-functionality/variant-ids/index.html index df9fe378..0d42b927 100644 --- a/3.22/core-functionality/variant-ids/index.html +++ b/3.22/core-functionality/variant-ids/index.html @@ -6,13 +6,13 @@ Variant IDs | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
- - +
Version: 3.22

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
+ + \ No newline at end of file diff --git a/3.22/data-sources/1000Genomes-snv-json/index.html b/3.22/data-sources/1000Genomes-snv-json/index.html index 07f0f2e4..e2d04305 100644 --- a/3.22/data-sources/1000Genomes-snv-json/index.html +++ b/3.22/data-sources/1000Genomes-snv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-snv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
- - +
Version: 3.22

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
+ + \ No newline at end of file diff --git a/3.22/data-sources/1000Genomes-sv-json/index.html b/3.22/data-sources/1000Genomes-sv-json/index.html index a5811ff6..b9eb2982 100644 --- a/3.22/data-sources/1000Genomes-sv-json/index.html +++ b/3.22/data-sources/1000Genomes-sv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-sv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - +
Version: 3.22

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
+ + \ No newline at end of file diff --git a/3.22/data-sources/1000Genomes/index.html b/3.22/data-sources/1000Genomes/index.html index d754c8ef..05213df7 100644 --- a/3.22/data-sources/1000Genomes/index.html +++ b/3.22/data-sources/1000Genomes/index.html @@ -6,15 +6,15 @@ 1000 Genomes | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 +

Version: 3.22

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 GRCh38

JSON Output

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

Structural Variants

VCF File Parsing

The VCF files contain entries like the following:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A <CN0>,<CN2>,<CN3>,<CN4> 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4

Please note that, CNVs are allele-specific. For example, HG00096 is effectively copy number 4, which would be a net gain on chr22.

1000 Genomes contains 5 types of structural variants:

  • CNV
  • DEL
  • DUP
  • INS
  • INV

Since data of 1000 genomes is provided in VCF format, we assume that the coordinates follow the vcf format, i.e., there is a padding base for symbolic alleles. So all the interval can be interpreted as [BEGIN+1, END]. Similarly, for all other variant types except insertion, END is far larger than BEGIN. The distribution of BEGIN and END for insertions is summarized below.

Insertion issues

  • END = BEGIN for 6/165
  • END = BEGIN+2 for 93/165
  • END = BEGIN+3 for 11/165
  • END = BEGIN+4 for 11/165
  • END – BEGIN range from 5 to 1156 for others.

Converting VCF svTypes to SO sequence alterations

The svType will be captured in our JSON file under the sequenceAlteration key. Here's the translation we'll use according to svType in 1000 Genomes.

svTypeAlternative Alleles contain <CN*>sequenceAlteration
ALUFALSEmobile_element_insertion
DUPTRUEcopy_number_gain
CNVTRUEcopy_number_gain (observed_gains >0 and observed_losses =0)
copy_number_loss (observed_gains = 0 and observed_losses > 0)
copy_number_variation (otherwise)
DELTRUEcopy_number_loss
LINE1FALSEmobile_element_insertion
SVAFALSEmobile_element_insertion
INVFALSEinversion
INSFALSEinsertion

Exceptions

We discard structural variants without END

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
21 9495848 esv3646347 A <INS:ME:LINE1> 100 PASS AC=1543;AF=0.308107;AN=5008;CS=L1_umary;MEINFO=LINE1,5669,6005,+;NS=2504;SVLEN=336;SVTYPE=LINE1;TSD=null;DP=20015;EAS_AF=0.3125;AMR_AF=0.2911;AFR_AF=0.3026;EUR_AF=0.2922;SAS_AF=0.3395;VT=SV GT 0|0 1|1 1|0 0|1 1|0 1|0 0|0

CNVs in chrY

  • No other types of structural variants exist in chrY
  • Since copy number is provided in genotype field, we directly parse the copy number from "CN" field.
  • For most CNVs in chrY, the reference copy number is 1, but the refence number for CNVs in segmental duplication sites is 2 (<CN2> in the 2nd example). All segmental duplication calls have identifiers starting with GS_SD_M2.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00101 HG00103 HG00105 HG00107 HG00108
Y 2888555 CNV_Y_2888555_3014661 T <CN2> 100 PASS AC=1;AF=0.000817661;AN=1223;END=3014661;NS=1233;SVTYPE=CNV;AMR_AF=0.0000;AFR_AF=0.0000;EUR_AF=0.0000;SAS_AF=0.0019;EAS_AF=0.0000;VT=SV GT:CN:CNL:CNP:CNQ:GP:GQ:PL 0:1:-1000,0,-58.45:-1000,0,-61.55:99:0,-61.55:99:0,585 0:1:-296.36,0,-16.6:-300.46,0,-19.7:99:0,-19.7:99:0,166 0:1:-1000,0,-39.44:-1000,0,-42.54:99:0,-42.54:99:0,394
Y 6128381 GS_SD_M2_Y_6128381_6230094_Y_9650284_9752225 C <CN1>,<CN3> 100 PASS AC=4,2;AF=0.00327065,0.00163532;AN=1223;END=6230094;NS=1233;SVTYPE=CNV;AMR_AF=0.0029,0.0029;AFR_AF=0.0016,0.0016;EUR_AF=0.0000,0.0000;SAS_AF=0.0038,0.0000;EAS_AF=0.0000,0.0000;VT=SV;EX_TARGET GT:CN:CNL:CNP:CNQ:GP:GQ 0:2:-1000,-138.78,0,-38.53:-1000,-141.27,0,-41.33:99:0,-141.27,-41.33:99 0:2:-1000,-53.32,0,-17.85:-1000,-55.81,0,-20.64:99:0,-55.81,-20.64:99 0:2:-1000,-71.83,0,-32.5:-1000,-74.32,0,-35.29:99:0,-74.32,-35.29:99 0:2:-1000,-60.96,0,-20.29:-1000,-63.45,0,-23.08:99:0,-63.45,-23.08:99 0:2:-1000,-77.6,0,-31.45:-1000,-80.09,0,-34.24:99:0,-80.09,-34.24:99

JSON Output

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - + + \ No newline at end of file diff --git a/3.22/data-sources/amino-acid-conservation-json/index.html b/3.22/data-sources/amino-acid-conservation-json/index.html index 10c2c869..16762cf5 100644 --- a/3.22/data-sources/amino-acid-conservation-json/index.html +++ b/3.22/data-sources/amino-acid-conservation-json/index.html @@ -6,13 +6,13 @@ amino-acid-conservation-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - +
Version: 3.22

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
+ + \ No newline at end of file diff --git a/3.22/data-sources/amino-acid-conservation/index.html b/3.22/data-sources/amino-acid-conservation/index.html index 976c8838..0f0d94c8 100644 --- a/3.22/data-sources/amino-acid-conservation/index.html +++ b/3.22/data-sources/amino-acid-conservation/index.html @@ -6,14 +6,14 @@ Amino Acid Conservation | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. +

Version: 3.22

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. For position 6, we would say that we have 43% conservation (3/7) since three organisms share the same residue as humans.

Assigning scores to Illumina Connected Annotations transcripts

The source FASTA file comes with Ensembl/UCSC transcript ids of the transcripts used for alignments. The Illumina Connected Annotations cache has RefSeq and Ensembl transcripts and our first attempt was to map the given Ensembl/UCSC ids to their equivalent RefSeq/Ensembl ids. This attempt was unsuccessful since UCSC Table Browser provided mapping without version numbers. So we proceeded as follows:

  • Take proteins which have a unique mapping (and hence one set of conservation scores). For ones that mapped to both ChrX and ChrY, we accepted the one from ChrX.
  • A Illumina Connected Annotations transcript having an exact peptide sequence match with a uniquely aligned protein is assigned the corresponding conservation scores.

Unfortunately this left us with a very small number of transcripts having conservation scores.

GRCh37

  • Source FASTA contained 41957 protein alignments.
  • 38165 proteins had unique scores.
  • 88 aligned proteins existed in Illumina Connected Annotations cache.
  • 118 transcripts had conservation scores.

GRCh38

  • Source FASTA contained 110024 protein alignments.
  • 88961 proteins had unique scores.
  • 11688 aligned proteins existed in Illumina Connected Annotations cache.
  • 12098 transcripts had conservation scores.

Download URL

GRCh37: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz

GRCh38: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz

JSON Output

Conservation scores are reported in the transcript section. One score is reported for each alt allele

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - + + \ No newline at end of file diff --git a/3.22/data-sources/cancer-hotspots/index.html b/3.22/data-sources/cancer-hotspots/index.html index 661c131a..47d331a1 100644 --- a/3.22/data-sources/cancer-hotspots/index.html +++ b/3.22/data-sources/cancer-hotspots/index.html @@ -6,14 +6,14 @@ Cancer Hotspots | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. +

Version: 3.22

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. Then we match each entry using these notations.

caution

We currently skip all variants labeled as splice from the source

JSON Output

The data source will be captured under the cancerHotspots key in the transcript section.

{
"transcript":"NM_002524.5",
"source":"RefSeq",
"bioType":"mRNA",
"aminoAcids":"Q/K",
"proteinPos":"61",
"geneId":"4893",
"hgnc":"NRAS",
"hgvsc":"NM_002524.5:c.181C>A",
"hgvsp":"NP_002515.1:p.(Gln61Lys)",
"isCanonical":true,
"proteinId":"NP_002515.1",
"cancerHotspots":{
"residue":"Q61",
"numSamples":422,
"numAltAminoAcidSamples":142,
"qValue":0
}
}
FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble
- - + + \ No newline at end of file diff --git a/3.22/data-sources/clingen-dosage-json/index.html b/3.22/data-sources/clingen-dosage-json/index.html index 33124e14..8a3dea23 100644 --- a/3.22/data-sources/clingen-dosage-json/index.html +++ b/3.22/data-sources/clingen-dosage-json/index.html @@ -6,13 +6,13 @@ clingen-dosage-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
- - +
Version: 3.22

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
+ + \ No newline at end of file diff --git a/3.22/data-sources/clingen-gene-validity-json/index.html b/3.22/data-sources/clingen-gene-validity-json/index.html index 95e4dc84..fa309a1a 100644 --- a/3.22/data-sources/clingen-gene-validity-json/index.html +++ b/3.22/data-sources/clingen-gene-validity-json/index.html @@ -6,13 +6,13 @@ clingen-gene-validity-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
- - +
Version: 3.22

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
+ + \ No newline at end of file diff --git a/3.22/data-sources/clingen-json/index.html b/3.22/data-sources/clingen-json/index.html index ff1067d2..27784eb8 100644 --- a/3.22/data-sources/clingen-json/index.html +++ b/3.22/data-sources/clingen-json/index.html @@ -6,13 +6,13 @@ clingen-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
- - +
Version: 3.22

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
+ + \ No newline at end of file diff --git a/3.22/data-sources/clingen/index.html b/3.22/data-sources/clingen/index.html index 9da8a5fb..f9531b64 100644 --- a/3.22/data-sources/clingen/index.html +++ b/3.22/data-sources/clingen/index.html @@ -6,13 +6,13 @@ ClinGen | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2
- - +
Version: 3.22

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2
+ + \ No newline at end of file diff --git a/3.22/data-sources/clinvar-json/index.html b/3.22/data-sources/clinvar-json/index.html index 494cab89..cdc18448 100644 --- a/3.22/data-sources/clinvar-json/index.html +++ b/3.22/data-sources/clinvar-json/index.html @@ -6,13 +6,13 @@ clinvar-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
- - +
Version: 3.22

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
+ + \ No newline at end of file diff --git a/3.22/data-sources/clinvar/index.html b/3.22/data-sources/clinvar/index.html index e6d00210..ff3421df 100644 --- a/3.22/data-sources/clinvar/index.html +++ b/3.22/data-sources/clinvar/index.html @@ -6,14 +6,14 @@ ClinVar | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

ClinVar

Overview

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

RCV File

Example

Here's a full RCV entry.

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

ID

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinVarAccession Acc="RCV000000001" Version="2">
</ClinVarSet>

The Acc and Version fields are merged to form the ID (RCV000000001.2)

LastUpdatedDate

<ClinVarSet>
<ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
</ClinVarSet>

Significance

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

ReviewStatus

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

Phenotypes

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="62">
<Trait Type="Disease">
<Name>
<ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
</Name>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

We only use the field with Type="Preferred". Multiple phenotypes may be reported

Location, Variant Type and Variant Id

<ReferenceClinVarAssertion>
<GenotypeSet Type="CompoundHeterozygote" ID="424709">
<MeasureSet Type="Variant" ID="81">
<Measure Type="single nucleotide variant" ID="15120">
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
</Measure>
</MeasureSet>
</GenotypeSet>
</ReferenceClinVarAssertion>
  • The variant position is extracted from the fields for their respective assemblies.
  • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
  • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
  • If a required allele is not available, we extract it from the reference sequence.
  • Only variants having a dbSNP id are extracted.
  • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
  • VariantId is extracted from the MeasureSet attributes.
  • VariantType is extracted from the Measure attributes.
    unsupported variant types

    We currently don't support the following variant types:

    • Microsatellite
    • protein only
    • fusion
    • Complex
    • Variation
    • Translocation

MedGen, OMIM, Orphanet IDs

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="175">
<Trait ID="3036" Type="Disease">
<XRef ID="C0086651" DB="MedGen"/>
<XRef ID="309297" DB="Orphanet"/>
<XRef ID="582" DB="Orphanet"/>
<XRef Type="MIM" ID="253000" DB="OMIM"/>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

AlleleOrigins

<ClinVarAssertion>
<Origin>germline</Origin>
</ClinVarAssertion>

We only extract all Allele Origins from Submissions (SCV) entries.

PubMedIds

<ClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<Citation Type="general">
<ID Source="PubMed">12114475</ID>
</Citation>
</ClinicalSignificance>
<AttributeSet>
<Attribute Type="AssertionMethod">LMM Criteria</Attribute>
<Citation>
<ID Source="PubMed">24033266</ID>
</Citation>
</AttributeSet>
<ObservedIn>
<ObservedData ID="9727445">
<Citation Type="general">
<ID Source="PubMed">9113933</ID>
</Citation>
</ObservedData>
</ObservedIn>
<Citation Type="general">
<ID Source="PubMed">23757202</ID>
</Citation>
</ClinVarAssertion>

We only extract all Pubmed Ids from Submissions (SCV) entries.

Parsing Significance

Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2016-10-13">
<ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
<Description>Pathogenic/Likely pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2012-06-07">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Conflicting interpretations of pathogenicity</Description>
<Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
</ClinicalSignificance>

Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

Varying Delimiters

The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

VCV File

Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
<RecordStatus>current</RecordStatus>
<Species>Homo sapiens</Species>
<IncludedRecord>
<SimpleAllele AlleleID="425239" VariationID="431749">
<GeneList>
<Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
</Location>
<OMIM>601142</OMIM>
</Gene>
<Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
</Location>
<OMIM>607215</OMIM>
</Gene>
</GeneList>
<Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
<VariantType>copy number gain</VariantType>
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<XRefList>
<XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
</XRefList>
</SimpleAllele>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<SubmittedInterpretationList>
<SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
</SubmittedInterpretationList>
<InterpretedVariationList>
<InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
</InterpretedVariationList>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

id

<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

The Acc and Version fields are merged to form the ID (RCV000000001.2)

significance

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<SimpleAllele>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
</SimpleAllele>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

May have multiple significances listed.

reviewStatus

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Known Issues

Known Issues
  • The XML file contains ~1k more entries (out of 162K) than the VCF file
  • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
  • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", +
    Version: 3.22

    ClinVar

    Overview

    ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

    Publication

    Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

    RCV File

    Example

    Here's a full RCV entry.

    Parsing

    In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

    ID

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinVarAccession Acc="RCV000000001" Version="2">
    </ClinVarSet>

    The Acc and Version fields are merged to form the ID (RCV000000001.2)

    LastUpdatedDate

    <ClinVarSet>
    <ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
    </ClinVarSet>

    Significance

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>
    </ClinVarSet>

    ReviewStatus

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>
    </ClinVarSet>

    Phenotypes

    <ReferenceClinVarAssertion>
    <TraitSet Type="Disease" ID="62">
    <Trait Type="Disease">
    <Name>
    <ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
    </Name>
    </Trait>
    </TraitSet>
    </ReferenceClinVarAssertion>

    We only use the field with Type="Preferred". Multiple phenotypes may be reported

    Location, Variant Type and Variant Id

    <ReferenceClinVarAssertion>
    <GenotypeSet Type="CompoundHeterozygote" ID="424709">
    <MeasureSet Type="Variant" ID="81">
    <Measure Type="single nucleotide variant" ID="15120">
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
    AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
    stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
    positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
    AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
    stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
    positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
    </Measure>
    </MeasureSet>
    </GenotypeSet>
    </ReferenceClinVarAssertion>
    • The variant position is extracted from the fields for their respective assemblies.
    • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
    • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
    • If a required allele is not available, we extract it from the reference sequence.
    • Only variants having a dbSNP id are extracted.
    • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
    • VariantId is extracted from the MeasureSet attributes.
    • VariantType is extracted from the Measure attributes.
      unsupported variant types

      We currently don't support the following variant types:

      • Microsatellite
      • protein only
      • fusion
      • Complex
      • Variation
      • Translocation

    MedGen, OMIM, Orphanet IDs

    <ReferenceClinVarAssertion>
    <TraitSet Type="Disease" ID="175">
    <Trait ID="3036" Type="Disease">
    <XRef ID="C0086651" DB="MedGen"/>
    <XRef ID="309297" DB="Orphanet"/>
    <XRef ID="582" DB="Orphanet"/>
    <XRef Type="MIM" ID="253000" DB="OMIM"/>
    </Trait>
    </TraitSet>
    </ReferenceClinVarAssertion>

    AlleleOrigins

    <ClinVarAssertion>
    <Origin>germline</Origin>
    </ClinVarAssertion>

    We only extract all Allele Origins from Submissions (SCV) entries.

    PubMedIds

    <ClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <Citation Type="general">
    <ID Source="PubMed">12114475</ID>
    </Citation>
    </ClinicalSignificance>
    <AttributeSet>
    <Attribute Type="AssertionMethod">LMM Criteria</Attribute>
    <Citation>
    <ID Source="PubMed">24033266</ID>
    </Citation>
    </AttributeSet>
    <ObservedIn>
    <ObservedData ID="9727445">
    <Citation Type="general">
    <ID Source="PubMed">9113933</ID>
    </Citation>
    </ObservedData>
    </ObservedIn>
    <Citation Type="general">
    <ID Source="PubMed">23757202</ID>
    </Citation>
    </ClinVarAssertion>

    We only extract all Pubmed Ids from Submissions (SCV) entries.

    Parsing Significance

    Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>

    <ClinicalSignificance DateLastEvaluated="2016-10-13">
    <ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
    <Description>Pathogenic/Likely pathogenic</Description>
    </ClinicalSignificance>

    <ClinicalSignificance DateLastEvaluated="2012-06-07">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Conflicting interpretations of pathogenicity</Description>
    <Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
    </ClinicalSignificance>

    Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

    Varying Delimiters

    The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

    VCV File

    Example

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
    <VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
    <RecordStatus>current</RecordStatus>
    <Species>Homo sapiens</Species>
    <IncludedRecord>
    <SimpleAllele AlleleID="425239" VariationID="431749">
    <GeneList>
    <Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
    </Location>
    <OMIM>601142</OMIM>
    </Gene>
    <Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
    </Location>
    <OMIM>607215</OMIM>
    </Gene>
    </GeneList>
    <Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
    <VariantType>copy number gain</VariantType>
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    <XRefList>
    <XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
    </XRefList>
    </SimpleAllele>
    <ReviewStatus>no interpretation for the single variant</ReviewStatus>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    <SubmittedInterpretationList>
    <SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
    </SubmittedInterpretationList>
    <InterpretedVariationList>
    <InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
    </InterpretedVariationList>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    Parsing

    In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

    id

    <VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

    The Acc and Version fields are merged to form the ID (RCV000000001.2)

    significance

    <ClinVarVariationRelease>
    <VariationArchive>
    <IncludedRecord>
    <SimpleAllele>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    </SimpleAllele>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    May have multiple significances listed.

    reviewStatus

    <ClinVarVariationRelease>
    <VariationArchive>
    <IncludedRecord>
    <ReviewStatus>no interpretation for the single variant</ReviewStatus>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    Known Issues

    Known Issues
    • The XML file contains ~1k more entries (out of 162K) than the VCF file
    • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
    • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", etc.) as their alternate allele

    Download URLs

    ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz

    https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz

    JSON Output

    small variants:

    "clinvar":[
    {
    "id":"VCV000036581.3",
    "reviewStatus":"reviewed by expert panel",
    "significance":[
    "benign"
    ],
    "refAllele":"G",
    "altAllele":"A",
    "lastUpdatedDate":"2020-03-01",
    "isAlleleSpecific":true
    },
    {
    "id":"RCV000030258.4",
    "variationId":"VCV000036581.3",
    "reviewStatus":"reviewed by expert panel",
    "alleleOrigins":[
    "germline"
    ],
    "refAllele":"G",
    "altAllele":"A",
    "phenotypes":[
    "Lynch syndrome"
    ],
    "medGenIds":[
    "C1333990"
    ],
    "omimIds":[
    "120435"
    ],
    "significance":[
    "benign"
    ],
    "lastUpdatedDate":"2017-05-01",
    "isAlleleSpecific":true
    }
    ]

    large variants:

    "clinvar":[
    {
    "chromosome":"1",
    "begin":629025,
    "end":8537745,
    "variantType":"copy_number_loss",
    "id":"RCV000051993.4",
    "variationId":"VCV000058242.1",
    "reviewStatus":"criteria provided, single submitter",
    "alleleOrigins":[
    "not provided"
    ],
    "phenotypes":[
    "See cases"
    ],
    "significance":[
    "pathogenic"
    ],
    "lastUpdatedDate":"2022-04-21",
    "pubMedIds":[
    "21844811"
    ]
    },
    {
    "id":"VCV000058242.1",
    "reviewStatus":"criteria provided, single submitter",
    "significance":[
    "pathogenic"
    ],
    "lastUpdatedDate":"2022-04-21"
    },
    ......
    ]
    FieldTypeNotes
    idstringClinVar ID
    variationIdstringClinVar VCV ID
    variantTypestringvariant type
    reviewStatusstringsee possible values below
    alleleOriginsstring arraysee possible values below
    refAllelestring
    altAllelestring
    phenotypesstring array
    medGenIdsstring arrayMedGen IDs
    omimIdsstring arrayOMIM IDs
    orphanetIdsstring arrayOrphanet IDs
    significancestring arraysee possible values below
    lastUpdatedDatestringyyyy-MM-dd
    pubMedIdsstring arrayPubMed IDs
    isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

    reviewStatus:

    • no assertion provided
    • no assertion criteria provided
    • criteria provided, single submitter
    • practice guideline
    • classified by multiple submitters
    • criteria provided, conflicting interpretations
    • criteria provided, multiple submitters, no conflicts
    • no interpretation for the single variant

    alleleOrigins:

    • unknown
    • other
    • germline
    • somatic
    • inherited
    • paternal
    • maternal
    • de-novo
    • biparental
    • uniparental
    • not-tested
    • tested-inconclusive

    significance:

    • uncertain significance
    • not provided
    • benign
    • likely benign
    • likely pathogenic
    • pathogenic
    • drug response
    • histocompatibility
    • association
    • risk factor
    • protective
    • affects
    • conflicting data from submitters
    • other
    • no interpretation for the single variant
    • conflicting interpretations of pathogenicity

    Building the supplementary files

    The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

    Source data files

    Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

    ClinVarFullRelease_00-latest.xml.gz     ClinVarVariationRelease_00-latest.xml.gz
    ClinVarFullRelease_00-latest.xml.gz.version

    The version file is a text file with the follwoing format.

    NAME=ClinVar
    VERSION=20220505
    DATE=2022-05-05
    DESCRIPTION=A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence

    The help menu for the utility is as follows:

    dotnet SAUtils.dll clinvar
    ---------------------------------------------------------------------------
    SAUtils (c) 2022 Illumina, Inc.
    Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
    ---------------------------------------------------------------------------

    USAGE: dotnet SAUtils.dll clinvar [options]
    Creates a supplementary database with ClinVar annotations

    OPTIONS:
    --ref, -r <VALUE> compressed reference sequence file
    --rcv, -i <VALUE> ClinVar Full release XML file
    --vcv, -c <VALUE> ClinVar Variation release XML file
    --out, -o <VALUE> output directory
    --help, -h displays the help menu
    --version, -v displays the version

    dotnet SAUtils.dll clinvar

    Here is a sample execution:

    dotnet SAUtils.dll clinvar \\
    --ref ~/development/References/7/Homo_sapiens.GRCh38.Nirvana.dat --rcv ClinVarFullRelease_00-latest.xml.gz \\
    --vcv ClinVarVariationRelease_00-latest.xml.gz --out ~/development/SupplementaryDatabase/63/GRCh38
    ---------------------------------------------------------------------------
    SAUtils (c) 2022 Illumina, Inc.
    Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
    ---------------------------------------------------------------------------

    Found 1535677 VCV records
    Unknown vcv id:225946 found in RCV000211201.2
    Unknown vcv id:225946 found in RCV000211253.2
    Unknown vcv id:225946 found in RCV000211375.2
    Unknown vcv id:976117 found in RCV001253316.1
    Unknown vcv id:1321016 found in RCV001776995.2
    3 unknown VCVs found in RCVs.
    225946,976117,1321016
    0 unknown VCVs found in RCVs.
    Chromosome 1 completed in 00:00:15.1
    Chromosome 2 completed in 00:00:20.0
    Chromosome 3 completed in 00:00:09.7
    Chromosome 4 completed in 00:00:05.9
    Chromosome 5 completed in 00:00:09.8
    Chromosome 6 completed in 00:00:08.3
    Chromosome 7 completed in 00:00:08.7
    Chromosome 8 completed in 00:00:06.2
    Chromosome 9 completed in 00:00:08.6
    Chromosome 10 completed in 00:00:07.0
    Chromosome 11 completed in 00:00:11.7
    Chromosome 12 completed in 00:00:08.0
    Chromosome 13 completed in 00:00:06.3
    Chromosome 14 completed in 00:00:06.0
    Chromosome 15 completed in 00:00:06.6
    Chromosome 16 completed in 00:00:10.8
    Chromosome 17 completed in 00:00:13.8
    Chromosome 18 completed in 00:00:02.9
    Chromosome 19 completed in 00:00:08.7
    Chromosome 20 completed in 00:00:03.6
    Chromosome 21 completed in 00:00:02.4
    Chromosome 22 completed in 00:00:03.6
    Chromosome MT completed in 00:00:00.2
    Chromosome X completed in 00:00:07.5
    Chromosome Y completed in 00:00:00.0
    Maximum bp shifted for any variant:2
    Writing 37097 intervals to database...

    Time: 00:13:26.9

- - + + \ No newline at end of file diff --git a/3.22/data-sources/cosmic-cancer-gene-census/index.html b/3.22/data-sources/cosmic-cancer-gene-census/index.html index bb1f730b..56a5ce39 100644 --- a/3.22/data-sources/cosmic-cancer-gene-census/index.html +++ b/3.22/data-sources/cosmic-cancer-gene-census/index.html @@ -6,13 +6,13 @@ cosmic-cancer-gene-census | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
- - +
Version: 3.22

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
+ + \ No newline at end of file diff --git a/3.22/data-sources/cosmic-gene-fusion-json/index.html b/3.22/data-sources/cosmic-gene-fusion-json/index.html index a2da4f52..7757cc6f 100644 --- a/3.22/data-sources/cosmic-gene-fusion-json/index.html +++ b/3.22/data-sources/cosmic-gene-fusion-json/index.html @@ -6,13 +6,13 @@ cosmic-gene-fusion-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.22

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/3.22/data-sources/cosmic-json/index.html b/3.22/data-sources/cosmic-json/index.html index 9628bb8f..bdc84457 100644 --- a/3.22/data-sources/cosmic-json/index.html +++ b/3.22/data-sources/cosmic-json/index.html @@ -6,13 +6,13 @@ cosmic-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.22

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/3.22/data-sources/cosmic/index.html b/3.22/data-sources/cosmic/index.html index acba28d3..db3a73d7 100644 --- a/3.22/data-sources/cosmic/index.html +++ b/3.22/data-sources/cosmic/index.html @@ -6,12 +6,12 @@ COSMIC | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human +

Version: 3.22

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human cancers.

Publication

John G Tate, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, Charlotte G Cole, Celestino Creatore, Elisabeth Dawson, Peter Fish, Bhavana Harsha, Charlie Hathaway, Steve C Jupe, Chai Yin Kok, Kate Noble, Laura Ponting, Christopher C Ramshaw, Claire E Rye, Helen E Speedy, Ray Stefancsik, Sam L Thompson, Shicai Wang, Sari Ward, Peter J Campbell, Simon A Forbes. (2019) COSMIC: the Catalogue Of Somatic Mutations In @@ -22,7 +22,7 @@ pair when it is released in the database. Currently COSMIC includes information on fusions involved in solid tumours and leukaemias.

TSV extraction

Example

SAMPLE_ID SAMPLE_NAME PRIMARY_SITE  SITE_SUBTYPE_1  SITE_SUBTYPE_2  SITE_SUBTYPE_3  PRIMARY_HISTOLOGY HISTOLOGY_SUBTYPE_1 HISTOLOGY_SUBTYPE_2 HISTOLOGY_SUBTYPE_3 FUSION_ID TRANSLOCATION_NAME  5'_CHROMOSOME 5'_STRAND 5'_GENE_ID  5'_GENE_NAME  5'_LAST_OBSERVED_EXON 5'_GENOME_START_FROM  5'_GENOME_START_TO  5'_GENOME_STOP_FROM 5'_GENOME_STOP_TO 3'_CHROMOSOME 3'_STRAND 3'_GENE_ID  3'_GENE_NAME  3'_FIRST_OBSERVED_EXON  3'_GENOME_START_FROM  3'_GENOME_START_TO  3'_GENOME_STOP_FROM 3'_GENOME_STOP_TO FUSION_TYPE PUBMED_PMID
749711 HCC1187 breast NS NS NS carcinoma ductal_carcinoma NS NS 665 ENST00000360863.10(RGS22):r.1_3555::ENST00000369518.1(SYCP1):r.2100_3452 8 - 197199 RGS22 22 99981937 99981937 100106116 100106116 1 + 212470 SYCP1_ENST00000369518 24 114944339 114944339 114995367 114995367 Inferred Breakpoint 20033038

Parsing

From the TSV file, we're mainly interested in the following columns:

  • SAMPLE_ID
  • PRIMARY_SITE
  • PRIMARY_HISTOLOGY
  • HISTOLOGY_SUBTYPE_1
  • FUSION_ID
  • TRANSLOCATION_NAME
  • PUBMED_PMID
info

For all the histologies and sites, we replace all the underlines with spaces. salivary_gland would become salivary gland.

Parsing

To create the gene fusion entries in Illumina Connected Annotations, we perform the following on each row in the TSV file:

  • Group all entries by FUSION_ID
  • Using all the entries related to this FUSION_ID:
    • Collect all the PubMed IDs
    • Tally the number of observed sample IDs
    • Grab the HGVS r. notation (should not change throughout the FUSION_ID)
    • Tally the number of samples observed for each histology
    • Tally the number of samples observed for each site
  • Extract the transcript IDs from the HGVS notation and lookup the associated gene symbols

Aggregating Histologies & Sites

Aggregating Histologies & Sites was previously described in the small variants section.

Known Issues

Known Issues

There are some issues with the HGVS RNA notation:

  • For coding transcripts, HGVS numbering should use CDS coordinates. Right now COSMIC is using cDNA coordinates for all their fusions.

Download URL

GRCh37

GRCh38

JSON Output

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint

Cancer Gene Census

TSV Extraction

Example

GENE_NAME       CELL_TYPE       PUBMED_PMID     HALLMARK        IMPACT  DESCRIPTION     CELL_LINE
PRDM16 18496560 role in cancer oncogene oncogene
PRDM16 16015645 role in cancer fusion fusion

Parsing

To extract information about TSGs and oncogenes, the data based on the "role in cancer" attribute is filtered. For tumor suppressor genes, rows with the value "TSG" and for oncogenes, rows with the value "oncogene" are filtered. Some genes have both "TSG/oncogene" as their role, which indicates that they can act as both.

Columns

Only following columns are needed to gather required roles in cancer:

  • GENE_NAME
  • IMPACT
  • HALLMARK
Possible Roles in Cancer

While parsing, only following roles in cancer are found:

  • fusion
  • TSG
  • oncogene
Parsing Stats

The file contained following number of instances for each role type

Role in cancerTotal Instances
fusion149
TSG195
oncogene181
Total525

Known Issues

None

Download URL

JSON output

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
- - + + \ No newline at end of file diff --git a/3.22/data-sources/dann-json/index.html b/3.22/data-sources/dann-json/index.html index c7df1673..12ca4f84 100644 --- a/3.22/data-sources/dann-json/index.html +++ b/3.22/data-sources/dann-json/index.html @@ -6,13 +6,13 @@ dann-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - +
Version: 3.22

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.22/data-sources/dann/index.html b/3.22/data-sources/dann/index.html index c428657e..bfa0e17e 100644 --- a/3.22/data-sources/dann/index.html +++ b/3.22/data-sources/dann/index.html @@ -6,16 +6,16 @@ DANN | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). +

Version: 3.22

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). CADD is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. DANN improves on CADD (which uses Support Vector Machines (SVMs)) by capturing non-linear relationships by using a deep neural network instead of SVMs. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD’s SVM methodology.

Publication

Quang, Daniel, Yifei Chen, and Xiaohui Xie. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31.5 761-763 (2015). https://doi.org/10.1093/bioinformatics/btu703

TSV File

Example

chr     grch37_pos  ref     alt     DANN
1 10001 T A 0.16461391399220135
1 10001 T C 0.4396994049749739
1 10001 T G 0.38108629377072734
1 10002 A C 0.36182020272810128
1 10002 A G 0.44413258111779291
1 10002 A T 0.16812846819989813

Parsing

From the CSV file, we are interested in all columns:

  • chr
  • grch37_pos
  • ref
  • alt
  • DANN

GRCh38 liftover

The data is not available for GRCh38 on DANN website. We performed a liftover from GRCh37 to GRCh38 using crossmap.

Known Issues

None

Download URL

https://cbcl.ics.uci.edu/public_data/DANN/

JSON Output

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - + + \ No newline at end of file diff --git a/3.22/data-sources/dbsnp-json/index.html b/3.22/data-sources/dbsnp-json/index.html index bfa71a76..610c409d 100644 --- a/3.22/data-sources/dbsnp-json/index.html +++ b/3.22/data-sources/dbsnp-json/index.html @@ -6,13 +6,13 @@ dbsnp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
- - +
Version: 3.22

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
+ + \ No newline at end of file diff --git a/3.22/data-sources/dbsnp/index.html b/3.22/data-sources/dbsnp/index.html index 91407284..423e5084 100644 --- a/3.22/data-sources/dbsnp/index.html +++ b/3.22/data-sources/dbsnp/index.html @@ -6,13 +6,13 @@ dbSNP | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
- - +
Version: 3.22

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
+ + \ No newline at end of file diff --git a/3.22/data-sources/decipher-json/index.html b/3.22/data-sources/decipher-json/index.html index 98702015..b3589e71 100644 --- a/3.22/data-sources/decipher-json/index.html +++ b/3.22/data-sources/decipher-json/index.html @@ -6,13 +6,13 @@ decipher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - +
Version: 3.22

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
+ + \ No newline at end of file diff --git a/3.22/data-sources/decipher/index.html b/3.22/data-sources/decipher/index.html index 8056c82e..bc380b3e 100644 --- a/3.22/data-sources/decipher/index.html +++ b/3.22/data-sources/decipher/index.html @@ -6,14 +6,14 @@ DECIPHER | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz +

Version: 3.22

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz

JSON output

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - + + \ No newline at end of file diff --git a/3.22/data-sources/fusioncatcher-json/index.html b/3.22/data-sources/fusioncatcher-json/index.html index a9fa449a..d27a4c25 100644 --- a/3.22/data-sources/fusioncatcher-json/index.html +++ b/3.22/data-sources/fusioncatcher-json/index.html @@ -6,13 +6,13 @@ fusioncatcher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.22

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/3.22/data-sources/fusioncatcher/index.html b/3.22/data-sources/fusioncatcher/index.html index ad86fd2a..bbdcf7fd 100644 --- a/3.22/data-sources/fusioncatcher/index.html +++ b/3.22/data-sources/fusioncatcher/index.html @@ -6,13 +6,13 @@ FusionCatcher | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.22

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/3.22/data-sources/gerp-json/index.html b/3.22/data-sources/gerp-json/index.html index d79b9077..7429c5be 100644 --- a/3.22/data-sources/gerp-json/index.html +++ b/3.22/data-sources/gerp-json/index.html @@ -6,13 +6,13 @@ gerp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - +
Version: 3.22

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
+ + \ No newline at end of file diff --git a/3.22/data-sources/gerp/index.html b/3.22/data-sources/gerp/index.html index 11b7fd76..2e93a8ca 100644 --- a/3.22/data-sources/gerp/index.html +++ b/3.22/data-sources/gerp/index.html @@ -6,15 +6,15 @@ GERP | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. +

Version: 3.22

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint (Rejected Substitutions). Illumina Connected Annotations uses GERP++ which is based on a significantly faster and more statistically robust maximum likelihood estimation procedure to compute expected rates of evolution.

Publication

Davydov, Eugene V., et al. "Identifying a high fraction of the human genome to be under selective constraint using GERP++." PLoS computational biology 6.12 e1001025 (2010). https://doi.org/10.1371/journal.pcbi.1001025

Source Files

Example GRCh37

GRCh37 file is a TSV format

chr     position    GERP
1 12177 0.83
1 12178 -0.206
1 12179 -0.492
1 12180 -1.66
1 12181 0.83
1 12182 0.83
1 12183 -0.417
1 12184 0.83

Example GRCh38

GRCh38 file is a lift-over BED format

chr     pos_start   pos_end     GERP
1 12646 12647 0.298
1 12647 12648 2.63
1 12648 12649 1.87
1 12649 12650 0.252
1 12650 12651 -2.06
1 12651 12652 2.61
1 12652 12653 3.97

Parsing

From the CSV file, we are interested in columns:

  • chr
  • position
  • GERP

Known Issues

None

Download URL

GRCh37

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html

GRCh38

The data is not available for GRCh38 on GERP++ website, and was obtained from https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/

JSON Output

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - + + \ No newline at end of file diff --git a/3.22/data-sources/gme-json/index.html b/3.22/data-sources/gme-json/index.html index ff680d3c..a44129ae 100644 --- a/3.22/data-sources/gme-json/index.html +++ b/3.22/data-sources/gme-json/index.html @@ -6,13 +6,13 @@ gme-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.22

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.22/data-sources/gme/index.html b/3.22/data-sources/gme/index.html index 81bb00bb..88a8efba 100644 --- a/3.22/data-sources/gme/index.html +++ b/3.22/data-sources/gme/index.html @@ -6,13 +6,13 @@ GME Variome | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.22

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.22/data-sources/gnomad-lof-json/index.html b/3.22/data-sources/gnomad-lof-json/index.html index 7300b4a0..eaf145e5 100644 --- a/3.22/data-sources/gnomad-lof-json/index.html +++ b/3.22/data-sources/gnomad-lof-json/index.html @@ -6,13 +6,13 @@ gnomad-lof-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
- - +
Version: 3.22

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
+ + \ No newline at end of file diff --git a/3.22/data-sources/gnomad-small-variants-json/index.html b/3.22/data-sources/gnomad-small-variants-json/index.html index 24a04549..2b6369a8 100644 --- a/3.22/data-sources/gnomad-small-variants-json/index.html +++ b/3.22/data-sources/gnomad-small-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
- - +
Version: 3.22

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
+ + \ No newline at end of file diff --git a/3.22/data-sources/gnomad-structural-variants-data_description/index.html b/3.22/data-sources/gnomad-structural-variants-data_description/index.html index d0e4d5e3..99af7c67 100644 --- a/3.22/data-sources/gnomad-structural-variants-data_description/index.html +++ b/3.22/data-sources/gnomad-structural-variants-data_description/index.html @@ -6,14 +6,14 @@ gnomad-structural-variants-data_description | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +

Version: 3.22

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type KeyGRCh38 Source SV Type Key
copy_number_variationcopy number variation
deletionDEL, CN=0deletion
duplicationDUPduplication
insertionINSinsertion
inversionINVinversion
mobile_element_insertionINS:MEmobile element insertion
mobile_element_insertionINS:ME:ALUalu insertion
mobile_element_insertionINS:ME:LINE1line1 insertion
mobile_element_insertionINS:ME:SVAsva insertion
structural alterationsequence alteration
complex_structural_alterationCPX
- - + + \ No newline at end of file diff --git a/3.22/data-sources/gnomad-structural-variants-json/index.html b/3.22/data-sources/gnomad-structural-variants-json/index.html index d2e1873b..f7962efc 100644 --- a/3.22/data-sources/gnomad-structural-variants-json/index.html +++ b/3.22/data-sources/gnomad-structural-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - +
Version: 3.22

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
+ + \ No newline at end of file diff --git a/3.22/data-sources/gnomad/index.html b/3.22/data-sources/gnomad/index.html index f042b2a5..caefc109 100644 --- a/3.22/data-sources/gnomad/index.html +++ b/3.22/data-sources/gnomad/index.html @@ -6,17 +6,17 @@ gnomAD | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note +

Version: 3.22

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note The gnomAD structural variant annotations are in a preview stage at the moment. Currently, the annotations do not include translocation breakends. Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type KeyGRCh38 Source SV Type Key
copy_number_variationcopy number variation
deletionDEL, CN=0deletion
duplicationDUPduplication
insertionINSinsertion
inversionINVinversion
mobile_element_insertionINS:MEmobile element insertion
mobile_element_insertionINS:ME:ALUalu insertion
mobile_element_insertionINS:ME:LINE1line1 insertion
mobile_element_insertionINS:ME:SVAsva insertion
structural alterationsequence alteration
complex_structural_alterationCPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

GRCh38

Note: The data was unavailable from gnomAD 2.1 original source, however the lifted over structural variant dataset was created by dbVar and was obtained from them https://www.ncbi.nlm.nih.gov/sites/dbvarapp/studies/nstd166/.

Download URL

https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/tsv/nstd166.GRCh38.variant_call.tsv.gz

JSON output

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - + + \ No newline at end of file diff --git a/3.22/data-sources/mito-heteroplasmy/index.html b/3.22/data-sources/mito-heteroplasmy/index.html index 2485b092..165492bb 100644 --- a/3.22/data-sources/mito-heteroplasmy/index.html +++ b/3.22/data-sources/mito-heteroplasmy/index.html @@ -6,13 +6,13 @@ Mitochondrial Heteroplasmy | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
- - +
Version: 3.22

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
+ + \ No newline at end of file diff --git a/3.22/data-sources/mitomap-small-variants-json/index.html b/3.22/data-sources/mitomap-small-variants-json/index.html index e652e845..63304641 100644 --- a/3.22/data-sources/mitomap-small-variants-json/index.html +++ b/3.22/data-sources/mitomap-small-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
- - +
Version: 3.22

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
+ + \ No newline at end of file diff --git a/3.22/data-sources/mitomap-structural-variants-json/index.html b/3.22/data-sources/mitomap-structural-variants-json/index.html index f87ccac4..94f151dc 100644 --- a/3.22/data-sources/mitomap-structural-variants-json/index.html +++ b/3.22/data-sources/mitomap-structural-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.22

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/3.22/data-sources/mitomap/index.html b/3.22/data-sources/mitomap/index.html index eaa850aa..0ea32f81 100644 --- a/3.22/data-sources/mitomap/index.html +++ b/3.22/data-sources/mitomap/index.html @@ -6,13 +6,13 @@ MITOMAP | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.22

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/3.22/data-sources/omim-json/index.html b/3.22/data-sources/omim-json/index.html index bf10467c..79b667a9 100644 --- a/3.22/data-sources/omim-json/index.html +++ b/3.22/data-sources/omim-json/index.html @@ -6,13 +6,13 @@ omim-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
- - +
Version: 3.22

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
+ + \ No newline at end of file diff --git a/3.22/data-sources/omim/index.html b/3.22/data-sources/omim/index.html index 6249ecc8..581639d4 100644 --- a/3.22/data-sources/omim/index.html +++ b/3.22/data-sources/omim/index.html @@ -6,18 +6,18 @@ OMIM | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
+

Version: 3.22

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
2 to disease phenotype itself was mapped
3 to molecular basis of the disorder is known
4 to disorder is a chromosome deletion or duplication syndrome

Phenotype character to comment

? to unconfirmed or possibly spurious mapping
[/] to nondiseases
{/} to contribute to susceptibility to multifactorial disorders or to susceptibility to infection

There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:

The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).

As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:

  • Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.
  • Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".
  • All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".
  • If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".

Here is a list of examples about how the description section supposed to be processed:

Original textProcessed text
({516030}, {516040}, and {516050})
(e.g., D1, {168461}; D2, {123833}; D3, {123834})(e.g., D1; D2; D3)
(desmocollins; see DSC2, {125645})(desmocollins; see DSC2)
(e.g., see {102700}, {300755})
(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650})(ADH). See also liver mitochondrial ALDH2
(see, e.g., CACNA1A; {601011})(see, e.g., CACNA1A)
(e.g., GSTA1; {138359}), mu (e.g., {138350})(e.g., GSTA1), mu
(NFKB; see {164011})(NFKB)
(see ISGF3G, {147574})(see ISGF3G)
(DCK; {EC 2.7.1.74}; {125450})(DCK; EC 2.7.1.74)

JSON output

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

Building the supplementary files

The first step in builing the OMIM .nga files is to use the SAUtils command's subcommand downloadOMIM to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable OmimApiKey.

export OmimApiKey=<users-omim-api-key>
SAUtils.dll downloadOMIM
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll downloadomim [options]
Download the OMIM gene annotation data

OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Gene Symbol Update Statistics
============================================
{
"NumGeneSymbolsUpToDate": 16788,
"NumGeneSymbolsUpdated": 95,
"NumGenesWhereBothIdsAreNull": 0,
"NumGeneSymbolsNotInCache": 106,
"NumResolvedGeneSymbolConflicts": 15,
"NumUnresolvedGeneSymbolConflicts": 0
}

Time: 00:04:08.9

Once the download has succeeded, the nga files can be produced using the SAUtils command's subcommand omim.

dotnet SAUtils.dll omim
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll omim [options]
Creates a gene annotation database from OMIM data

OPTIONS:
--m2g, -m <VALUE> MimToGeneSymbol tsv file
--json, -j <VALUE> OMIM entry json file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version


dotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:04.5
- - + + \ No newline at end of file diff --git a/3.22/data-sources/phylop-json/index.html b/3.22/data-sources/phylop-json/index.html index 53976d4e..fa381aeb 100644 --- a/3.22/data-sources/phylop-json/index.html +++ b/3.22/data-sources/phylop-json/index.html @@ -6,13 +6,13 @@ phylop-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - +
Version: 3.22

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + \ No newline at end of file diff --git a/3.22/data-sources/phylop/index.html b/3.22/data-sources/phylop/index.html index af7fc2dd..dab8c7ee 100644 --- a/3.22/data-sources/phylop/index.html +++ b/3.22/data-sources/phylop/index.html @@ -6,13 +6,13 @@ PhyloP | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

PhyloP

Overview

PhyloP (phylogenetic p-values) conservation scores are obtained from the [PHAST package] (http://compgen.bscb.cornell.edu/phast/) for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

WigFix File

The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:

fixedStep chrom=chr1 start=10918 step=1
0.064
0.058
0.064
0.058
0.064
0.064
fixedStep chrom=chr1 start=34045 step=1
0.111
0.100
0.111
0.111
0.100
0.111
0.111
0.111
0.100
0.111
-1.636

We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs.

Download URL

GRCh37: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/

GRCh38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - +
Version: 3.22

PhyloP

Overview

PhyloP (phylogenetic p-values) conservation scores are obtained from the [PHAST package] (http://compgen.bscb.cornell.edu/phast/) for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

WigFix File

The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:

fixedStep chrom=chr1 start=10918 step=1
0.064
0.058
0.064
0.058
0.064
0.064
fixedStep chrom=chr1 start=34045 step=1
0.111
0.100
0.111
0.111
0.100
0.111
0.111
0.111
0.100
0.111
-1.636

We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs.

Download URL

GRCh37: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/

GRCh38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + \ No newline at end of file diff --git a/3.22/data-sources/primate-ai-json/index.html b/3.22/data-sources/primate-ai-json/index.html index 71db241e..a61b63ab 100644 --- a/3.22/data-sources/primate-ai-json/index.html +++ b/3.22/data-sources/primate-ai-json/index.html @@ -6,13 +6,13 @@ primate-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

primate-ai-json

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0
- - +
Version: 3.22

primate-ai-json

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.22/data-sources/primate-ai/index.html b/3.22/data-sources/primate-ai/index.html index ecc78d90..e0f34fc3 100644 --- a/3.22/data-sources/primate-ai/index.html +++ b/3.22/data-sources/primate-ai/index.html @@ -6,17 +6,17 @@ Primate AI | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Primate AI

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. +

Version: 3.22

Primate AI

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. The model's innovative use of primate sequencing and structural data offers promising insights into variant interpretation and disease gene identification. The predictive score range between 0 and 1, with 0 being benign and 1 being most pathogenic.

For more details, refer to these publications:

Publication
  1. Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023). https://doi.org/10.1126/science.abn8197
  2. Sundaram, L., Gao, H., Padigepati, S.R. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170 (2018). https://doi.org/10.1038/s41588-018-0167-z
Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Primate AI is available in two versions based on assembly:

  1. Primate AI 3D: Only available for GRCh38
  2. Primate AI: Only available for GRCh37

Both have different file structures, and information. Therefore, they are handled separately:

Primate AI 3D: GRCh38

Parsing

CSV File

,chr,pos,non_flipped_ref,non_flipped_alt,gene_name,change_position_1based,ref_aa,alt_aa,score_PAI3D,percentile_PAI3D,refseq
0,chr1,69094,G,A,ENST00000335137.4,2,V,M,0.6169436463713646,0.5200308441794135,NM_001005484.1
1,chr1,69094,G,C,ENST00000335137.4,2,V,L,0.5557043975591658,0.4271457250214688,NM_001005484.1
2,chr1,69094,G,T,ENST00000335137.4,2,V,L,0.5557043975591658,0.4271457391722522,NM_001005484.1

From the CSV file, all columns are parsed:

  • chr
  • pos
  • ref
  • alt
  • gene_name
  • change_position_1based
  • ref_aa
  • alt_aa
  • score_PAI3D
  • percentile_PAI3D
  • refseq

The fields gene_name and refseq define the Ensembl and RefSeq transcript IDs respectively. These transcripts are passed as-is and some of them might be unrecognized/deprecated by RefSeq/Ensembl.

Parsing Command

dotnet SAUtils.dll \
PrimateAi \
--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \
--i "${ExternalDataSources}/PrimateAI/3D/PAI3D_wholeProteome_23_04_11.percentiles.pkg.refseq.csv.gz" \
--o "${SaUtilsOutput]"

Known Issues

Known Issues

Some transcript IDs defined in the data file are obsolete, retired, or updated. They are not removed or modified by Illumina Connected Annotations, and are passed as-is from the PrimateAI-3D data source.

Example:

ENST00000643905.1 transcript is retired according to Ensembl

NM_182838.2 transcript is removed because it is a pseudo-gene according to RefSeq

Download URL

https://primad.basespace.illumina.com/

Primate AI: GRCh37

Parsing

TSV File

chr pos ref alt refAA   altAA   strand_1pos_0neg    trinucleotide_context   UCSC_gene   ExAC_coverage   primateDL_score
chr10 1046704 C T R C 1 CCG uc001ift.3 45.49 0.849114537239
chr10 1046704 C G R G 1 CCG uc001ift.3 45.49 0.795686006546

From the TSV file, we're mainly interested in the following columns:

  • chr
  • pos
  • ref
  • alt
  • primateDL_score

We also use UCSC_gene to filter out variants that don't have matching gene models in Illumina Connected Annotations.

Pre-processing

Converting UCSC IDs

Primate AI only provides UCSC IDs. As an initial pre-processing step, we'll need to convert these to either Entrez or Ensembl Gene IDs.

The following queries are used to download the conversions from UCSC:

mysql -h genome-mysql.soe.ucsc.edu -u genome -A -P 3306 \
-e "select * FROM knownToLocusLink;" hg19 > ucsc_locuslink.tsv

mysql -h genome-mysql.soe.ucsc.edu -u genome -A -P 3306 \
-e "select knownToEnsembl.name, knownToEnsembl.value, ensGene.name2 FROM knownToEnsembl, ensGene WHERE knownToEnsembl.value = ensGene.name;" \
hg19 > ucsc_ensembl.tsv

Running the Pre-Processor

The Primate AI pre-processor can be run as follows:

dotnet PrimateAiPreProcessor.dll UGA_develop.tsv PrimateAI_scores_v0.2.tsv.gz \
ucsc_locuslink.tsv ucsc_ensembl.tsv PrimateAI_0.2_GRCh37.tsv.gz

During conversion, 0.5% of the UCSC Ids cannot be converted to either Entrez or Ensembl gene IDs. Once the gene IDs have been acquired, we check to see which are available in Illumina Connected Annotations.

The following Entrez Gene IDs were not found:

399753
401980
504189
504191
100293534

Here is the output from the pre-processor:

- loading UCSC to Entrez Gene ID dictionary... 73,432 genes loaded.
- loading UCSC to Ensembl Gene ID dictionary... 76,178 genes loaded.
- loading UGA gene ID to gene dictionary... 103,277 genes loaded.
- parsing Primate AI variants... 70,121,953 variants parsed.

# variants with unknown gene ID: 27,253 / 70,121,953
# genes with unknown gene ID: 109 / 19,614

# variants not in UGA: 2,036 / 70,121,953
# genes not in UGA: 6 / 19,614

Known Issues

Known Issues

The Primate AI data set provides raw scores, but the scores are biased according to gene context. I.e. a 0.4 means something different in TP53 than it does in KRAS.

As a result, the Primate AI team provided guidance on aggregating these scores and presenting them as percentiles with respect to the associated gene. According to their research, the 25th percentile is a good proxy for benign variants and the 75th percentile is a good proxy for pathogenic variants.

Download URL

https://basespace.illumina.com/s/cPgCSmecvhb4

JSON Output

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0
- - + + \ No newline at end of file diff --git a/3.22/data-sources/revel-json/index.html b/3.22/data-sources/revel-json/index.html index 0eb0bd2e..2d1dbe76 100644 --- a/3.22/data-sources/revel-json/index.html +++ b/3.22/data-sources/revel-json/index.html @@ -6,13 +6,13 @@ revel-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.22

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.22/data-sources/revel/index.html b/3.22/data-sources/revel/index.html index 340626f8..5ffb6024 100644 --- a/3.22/data-sources/revel/index.html +++ b/3.22/data-sources/revel/index.html @@ -6,13 +6,13 @@ REVEL | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.22

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.22/data-sources/splice-ai-json/index.html b/3.22/data-sources/splice-ai-json/index.html index 0f03d9f1..df67c945 100644 --- a/3.22/data-sources/splice-ai-json/index.html +++ b/3.22/data-sources/splice-ai-json/index.html @@ -6,13 +6,13 @@ splice-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.22

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/3.22/data-sources/splice-ai/index.html b/3.22/data-sources/splice-ai/index.html index 92dc5ad8..7b8a1d8a 100644 --- a/3.22/data-sources/splice-ai/index.html +++ b/3.22/data-sources/splice-ai/index.html @@ -6,13 +6,13 @@ Splice AI | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.22

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/3.22/data-sources/topmed-json/index.html b/3.22/data-sources/topmed-json/index.html index a4d328d3..05f8ca14 100644 --- a/3.22/data-sources/topmed-json/index.html +++ b/3.22/data-sources/topmed-json/index.html @@ -6,13 +6,13 @@ topmed-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.22

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.22/data-sources/topmed/index.html b/3.22/data-sources/topmed/index.html index 77d59e24..379286a6 100644 --- a/3.22/data-sources/topmed/index.html +++ b/3.22/data-sources/topmed/index.html @@ -6,13 +6,13 @@ TOPMed | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.22

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.22/file-formats/custom-annotations/index.html b/3.22/file-formats/custom-annotations/index.html index 02bd18ae..b52c6fbc 100644 --- a/3.22/file-formats/custom-annotations/index.html +++ b/3.22/file-formats/custom-annotations/index.html @@ -6,12 +6,12 @@ Custom Annotations | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another +

Version: 3.22

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another common use case involves protected health information (PHI). Custom annotations are a mechanism that enables both use cases.

Here are some examples of how our collaborators use custom annotations:

  • associating context from both a sample-level and a sample cohort level with the variant annotations
  • adding content that is licensed (e.g. HGMD) to the variant annotations

At the moment, we have two different custom annotation file formats. One provides additional annotations to variants (both small variants and SVs) while the other caters to gene annotations.

In both cases, the custom annotation file format is a tab-delimited file that is separated into two parts: the header & the data.

The header is where you can customize how you want the data to appear in the JSON file and provide context about the genome assembly and how Illumina Connected Annotations should match the variants.

At Illumina, there are usually many components downstream of Illumina Connected Annotations that have to parse our annotations. If a customer provides a custom @@ -34,7 +34,7 @@ chromosome, svLength, cytogeneticBand, etc. The title should also not conflict with other data source keys like clingen or dgv.

caution

Care should be taken not to annotate using multiple custom annotations that all use the same title.

Genome Assemblies

The following genome assemblies can be specified:

  • GRCh37
  • GRCh38

Matching Criteria

The matching criteria instructs how Illumina Connected Annotations should match a VCF variant to the custom annotation.

The following matching criteria can be specified:

  • allele - use this when you only want allele-specific matches. This is commonly the case when using allele frequency data sources like gnomAD
  • position - use this when you want positional matches. This is commonly used with disease phenotype data sources like ClinVar
  • sv - use this when you want to match to all other overlapping SVs. This use case arose when we were adding custom annotations for baseline copy number intervals along the genome.

Categories

Categories are not used by Illumina Connected Annotations, but are often used by downstream tools. Categories provide hints for how those tools should filter or display the annotation data.

When a category is specified, Illumina Connected Annotations will provide additional validation for those fields. The following table describes each category:

CategoryDescriptionValidation
AlleleCountallele counts for a specific populationSee the supported populations below
AlleleNumberallele numbers for a specific populationSee the supported populations below
AlleleFrequencyallele frequencies for a specific populationSee the supported populations below
PredictionACMG-style pathogenicity classificationsbenign (B)
likely benign (LB)
VUS
likely pathogenic (LP)
pathogenic (P)
Filterfree text that signals downstream tools to add the column to the filterMax 20 characters
Descriptionfree-text descriptionMax 100 characters
Identifierany IDMax 50 characters
HomozygousCountcount of homozygous individuals for a specific populationSee the supported populations below
Scoreany score valueAny double-precision floating point number

Descriptions

Descriptions are used to add more context to the categories. For now, descriptions are mainly used to associate allele counts, numbers, and frequencies with their respective populations.

Populations

The following populations were specified in the HapMap project, 1000 Genomes Project, ExAC, and gnomAD.

Population CodeSuper-population CodeDescription
ACBAFRAfrican Caribbeans in Barbados
AFRAFRAfrican
ALLALLAll populations
AMRAMRAd Mixed American
ASJAshkenazi Jewish
ASWAFRAmericans of African Ancestry in SW USA
BEBSASBengali from Bangladesh
CDXEASChinese Dai in Xishuangbanna, China
CEUEURUtah Residents (CEPH) with Northern and Western European Ancestry
CHBEASHan Chinese in Beijing, China
CHSEASSouthern Han Chinese
CLMAMRColombians from Medellin, Colombia
EASEASEast Asian
ESNAFREsan in Nigeria
EUREUREuropean
FINEURFinnish in Finland
GBREURBritish in England and Scotland
GIHSASGujarati Indian from Houston, Texas
GWDAFRGambian in Western Divisions in the Gambia
IBSEURIberian population in Spain
ITUSASIndian Telugu from the UK
JPTEASJapanese in Tokyo, Japan
KHVEASKinh in Ho Chi Minh City, Vietnam
LWKAFRLuhya in Webuye, Kenya
MAGAFRMandinka in the Gambia
MKKAFRMaasai in Kinyawa, Kenya
MSLAFRMende in Sierra Leone
MXLAMRMexican Ancestry from Los Angeles, USA
NFEEUREuropean (Non-Finnish)
OTHOTHOther
PELAMRPeruvians from Lima, Peru
PJLSASPunjabi from Lahore, Pakistan
PURAMRPuerto Ricans from Puerto Rico
SASSASSouth Asian
STUSASSri Lankan Tamil from the UK
TSIEURToscani in Italia
YRIAFRYoruba in Ibadan, Nigeria

Data Types

Each custom annotation can be one of the following data types:

  • bool - true or false
  • number - any integer or floating-point number
  • string - text
tip

For boolean variables, only keys with a true value will be output to the JSON object.

Using SAUtils

Illumina Connected Annotations includes a tool called SAUtils that converts various data sources into Illumina Connected Annotations's native binary format. The sub-commands customvar and customgene are used to specify a variant file or a gene file respectively.

Convert Variant File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory

Convert Gene File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customgene \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-c Data/Cache \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -c argument specifies the Illumina Connected Annotations cache path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory
- - + + \ No newline at end of file diff --git a/3.22/file-formats/illumina-annotator-json-file-format/index.html b/3.22/file-formats/illumina-annotator-json-file-format/index.html index 24935486..65f6ff42 100644 --- a/3.22/file-formats/illumina-annotator-json-file-format/index.html +++ b/3.22/file-formats/illumina-annotator-json-file-format/index.html @@ -6,13 +6,13 @@ Illumina Connected Annotations JSON File Format | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"isDecomposedVariant":true,
"isRecomposedVariant":true,
"linkedVids":["2:48010488:GTA:ATC"],
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
isDecomposedVariantbooltrue when the decomposed variant has been used to create another recomposed variant
isRecomposedVariantbooltrue when the variant is recomposed from two or more decomposed variants
linkedVidsstring arraylist of VIDs for variants connecting decomposed and recomposed variants
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Flagging Decomposed & Recomposed Variants

When two or more decomposed variants are recomposed into an MNV, the decomposed variants will be marked with "isDecomposedVariant":true.

Similarly, the recomposed variant will be shown as a new VCF position. This recomposed variant will be flagged with "isRecomposedVariant":true.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"polyPhenScore":0.95,
"polyPhenPrediction":"probably damaging",
"proteinId":"ENSP00000405294.1",
"siftScore":0.61,
"siftPrediction":"tolerated",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstring
cdsPosstring
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstring
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
polyPhenScorefloatrange: 0 - 1.0
polyPhenPredictionstringsee possible values below
proteinIdstringprotein ID. E.g. ENSP00000405294.1
siftScorefloatrange: 0 - 1.0
siftPredictionstringsee possible values below
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

PolyPhen

  • probably damaging
  • possibly damaging
  • benign
  • unknown

SIFT

  • tolerated
  • deleterious
  • tolerated - low confidence
  • deleterious - low confidence

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
- - +
Version: 3.22

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"isDecomposedVariant":true,
"isRecomposedVariant":true,
"linkedVids":["2:48010488:GTA:ATC"],
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
isDecomposedVariantbooltrue when the decomposed variant has been used to create another recomposed variant
isRecomposedVariantbooltrue when the variant is recomposed from two or more decomposed variants
linkedVidsstring arraylist of VIDs for variants connecting decomposed and recomposed variants
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Flagging Decomposed & Recomposed Variants

When two or more decomposed variants are recomposed into an MNV, the decomposed variants will be marked with "isDecomposedVariant":true.

Similarly, the recomposed variant will be shown as a new VCF position. This recomposed variant will be flagged with "isRecomposedVariant":true.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"polyPhenScore":0.95,
"polyPhenPrediction":"probably damaging",
"proteinId":"ENSP00000405294.1",
"siftScore":0.61,
"siftPrediction":"tolerated",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstring
cdsPosstring
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstring
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
polyPhenScorefloatrange: 0 - 1.0
polyPhenPredictionstringsee possible values below
proteinIdstringprotein ID. E.g. ENSP00000405294.1
siftScorefloatrange: 0 - 1.0
siftPredictionstringsee possible values below
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

PolyPhen

  • probably damaging
  • possibly damaging
  • benign
  • unknown

SIFT

  • tolerated
  • deleterious
  • tolerated - low confidence
  • deleterious - low confidence

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
+ + \ No newline at end of file diff --git a/3.22/index.html b/3.22/index.html index 1d8d8e06..d0d95114 100644 --- a/3.22/index.html +++ b/3.22/index.html @@ -6,16 +6,16 @@ Introduction | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. +

Version: 3.22

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. The current officially supported versions are:

Data SourceVersionRelease Date
RefSeqGCF_000001405.40-RS_2023_032023-03-21
Ensembl1102023-04-27

In addition, it uses external data sources to provide additional context for each variant. Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. The basic tier can be accessed free of charge. The professional tier requires a license. For access, please contact annotation_support@illumina.com.

Data SourceAvailabilityLatest Supported Version
Primate AI-3DProfessional1.0
Splice AIProfessional1.3
COSMICProfessional96
OMIMProfessional20231105
ClinVarBasic20231028
1000 Genomes ProjectBasicPhase 3 v3plus
DANNBasic20200205
dbSNPBasic156
DECIPHERBasic201509
GERPBasic20110522
GME VariomeBasic20160618
gnomADBasic3.1.2
MITOMAPBasic20200819
REVELBasic20200205
TOPMedBasicfreeze 5
Cancer HotspotsBasic2017
FusionCatcherBasic1.33
ClinGenBasic20231105
MultiZ 100 wayBasic20171006

Download

Please visit Illumina Connected Annotations.

- - + + \ No newline at end of file diff --git a/3.22/introduction/dependencies/index.html b/3.22/introduction/dependencies/index.html index afd2f0c3..de99e8a1 100644 --- a/3.22/introduction/dependencies/index.html +++ b/3.22/introduction/dependencies/index.html @@ -6,13 +6,13 @@ Dependencies | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
- - +
Version: 3.22

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
+ + \ No newline at end of file diff --git a/3.22/introduction/getting-started/index.html b/3.22/introduction/getting-started/index.html index a087b79c..ed82dad6 100644 --- a/3.22/introduction/getting-started/index.html +++ b/3.22/introduction/getting-started/index.html @@ -6,13 +6,13 @@ Getting Started | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet Annotator.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--help, -h displays the help menu
--version, -v displays the version

Supplementary annotation version: 69, Reference version: 7

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

- - +
Version: 3.22

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet Annotator.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--help, -h displays the help menu
--version, -v displays the version

Supplementary annotation version: 69, Reference version: 7

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

+ + \ No newline at end of file diff --git a/3.22/introduction/parsing-json/index.html b/3.22/introduction/parsing-json/index.html index ff1d5cd2..48ca9817 100644 --- a/3.22/introduction/parsing-json/index.html +++ b/3.22/introduction/parsing-json/index.html @@ -6,13 +6,13 @@ Parsing Illumina Connected Annotations JSON | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
- - +
Version: 3.22

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
+ + \ No newline at end of file diff --git a/3.22/utilities/jasix/index.html b/3.22/utilities/jasix/index.html index 2ad0b35b..378ba8b2 100644 --- a/3.22/utilities/jasix/index.html +++ b/3.22/utilities/jasix/index.html @@ -6,13 +6,13 @@ Jasix | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
- - +
Version: 3.22

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
+ + \ No newline at end of file diff --git a/3.22/utilities/sautils/index.html b/3.22/utilities/sautils/index.html index 394a0799..66ed4a22 100644 --- a/3.22/utilities/sautils/index.html +++ b/3.22/utilities/sautils/index.html @@ -6,13 +6,13 @@ SAUtils | IlluminaConnectedAnnotations - - + +
-
Version: 3.22

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema
- - +
Version: 3.22

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema
+ + \ No newline at end of file diff --git a/3.23/core-functionality/canonical-transcripts/index.html b/3.23/core-functionality/canonical-transcripts/index.html index cf231508..66c38ac8 100644 --- a/3.23/core-functionality/canonical-transcripts/index.html +++ b/3.23/core-functionality/canonical-transcripts/index.html @@ -6,13 +6,13 @@ Canonical Transcripts | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
- - +
Version: 3.23

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
+ + \ No newline at end of file diff --git a/3.23/core-functionality/gene-fusions/index.html b/3.23/core-functionality/gene-fusions/index.html index a003d742..a0d76d29 100644 --- a/3.23/core-functionality/gene-fusions/index.html +++ b/3.23/core-functionality/gene-fusions/index.html @@ -6,14 +6,14 @@ Gene Fusion Detection | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 &amp; FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 &amp; FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. +

Version: 3.23

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 &amp; FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 &amp; FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. If only unidirectional gene fusions are desired, only these two fusions can be detected. If enable-bidirectional-fusions is enabled, all four cases can be identified.

Interpreting translocation breakends

At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the VCF 4.2 specification.

REFALTMeaning
st[p[piece extending to the right of p is joined after t
st]p]reverse comp piece extending left of p is joined after t
s]p]tpiece extending to the left of p is joined before t
s[p[treverse comp piece extending right of p is joined before t

Variant Types

Specifically we can identify gene fusions from the following structural variant types:

  • deletions (<DEL>)
  • tandem_duplications (<DUP:TANDEM>)
  • inversions (<INV>)
  • translocation breakpoints (AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[)

Criteria

The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:

  1. After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if enable-bidirectional-fusions is not enabled. They can have the same or different orientations if enable-bidirectional-fusions is set.
  2. Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)
  3. Both transcripts must belong to different genes
  4. Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)

ETV6/RUNX1 Example

ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment.

VCF

Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND
chr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND
chr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND
chr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND

When you put these calls together, the resulting genomic rearrangement looks something like this:

JSON Output

The annotation for the first variant in the VCF looks like this:

{
"chromosome": "chr12",
"position": 12026270,
"refAllele": "C",
"altAlleles": [
"[chr21:36420865[C"
],
"filters": [
"PASS"
],
"cytogeneticBand": "12p13.2",
"clingen": [
{
"chromosome": "12",
"begin": 173786,
"end": 34835837,
"variantType": "copy_number_gain",
"id": "nsv995956",
"clinicalInterpretation": "pathogenic",
"phenotypes": [
"Decreased calvarial ossification",
"Delayed gross motor development",
"Feeding difficulties",
"Frontal bossing",
"Morphological abnormality of the central nervous system",
"Patchy alopecia"
],
"phenotypeIds": [
"HP:0002007",
"HP:0002011",
"HP:0002194",
"HP:0002232",
"HP:0005474",
"HP:0011968",
"MedGen:C0232466",
"MedGen:C1862862",
"MedGen:CN001816",
"MedGen:CN001820",
"MedGen:CN001989",
"MedGen:CN004852"
],
"observedGains": 1,
"validated": true
}
],
"variants": [
{
"vid": "12-12026270-C-[chr21:36420865[C",
"chromosome": "chr12",
"begin": 12026270,
"end": 12026270,
"isStructuralVariant": true,
"refAllele": "C",
"altAllele": "[chr21:36420865[C",
"variantType": "translocation_breakend",
"cosmicGeneFusions": [
{
"id": "COSF2245",
"numSamples": 249,
"geneSymbols": [
"ETV6",
"RUNX1"
],
"hgvsr": "ENST00000396373.4(ETV6):r.1_1283::ENST00000300305.3(RUNX1):r.504_6222",
"histologies": [
{
"name": "acute lymphoblastic B cell leukaemia",
"numSamples": 169
},
{
"name": "acute lymphoblastic leukaemia",
"numSamples": 80
}
],
"sites": [
{
"name": "haematopoietic and lymphoid tissue",
"numSamples": 249
}
],
"pubMedIds": [
7761424,
7780150,
8609706,
8751464,
8982044,
9067587,
9207408,
9226156,
9628428,
10463610,
10774753,
11091202,
12621238,
12661004,
12750722,
15104290,
15642392,
24557455,
26925663
]
}
],
"fusionCatcher": [
{
"genes": {
"first": {
"hgnc": "ETV6",
"isOncogene": true
},
"second": {
"hgnc": "RUNX1",
"isOncogene": true
}
},
"somaticSources": [
"DepMap CCLE",
"Cancer Genome Project",
"ChimerKB 4.0",
"ChimerPub 4.0",
"ChimerSeq 4.0",
"Known",
"Mitelman DB",
"OncoKB",
"TICdb"
]
}
],
"transcripts": [
{
"transcript": "ENST00000396373.4",
"source": "Ensembl",
"bioType": "protein_coding",
"introns": "5/7",
"geneId": "ENSG00000139083",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"geneFusions": [
{
"transcript": "ENST00000437180.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000300305.3",
"bioType": "protein_coding",
"intron": 1,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000482318.1",
"bioType": "nonsense_mediated_decay",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000482318.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000486278.2",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000486278.2(RUNX1):r.?_-15+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000455571.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000455571.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000475045.2",
"bioType": "protein_coding",
"intron": 11,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000475045.2(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
},
{
"transcript": "ENST00000416754.1",
"bioType": "protein_coding",
"intron": 2,
"geneId": "ENSG00000159216",
"hgnc": "RUNX1",
"hgvsr": "ENST00000416754.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],
"isCanonical": true,
"proteinId": "ENSP00000379658.3"
},
{
"transcript": "NM_001987.4",
"source": "RefSeq",
"bioType": "protein_coding",
"introns": "5/7",
"geneId": "2120",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],
"isCanonical": true,
"proteinId": "NP_001978.1"
}
]
}
]
}
FieldTypeNotes
transcriptstringtranscript ID
bioTypestringdescriptions of the biotypes from Ensembl
exonintexon that contained fusion breakpoint
intronintintron that contained fusion breakpoint
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
hgvsrstringHGVS RNA nomenclature

Gene Fusion Data Sources

To provide more context to our gene fusions, we provide the following gene fusion data sources:

Consequences

When a gene fusion is identified, we add the following Sequence Ontology consequence:

              "consequence": [
"transcript_variant",
"gene_fusion"
],
  • If both transcripts have the same orientation, we label it as unidirectional_gene_fusion, if they have different orientations, we label it as bidirectional_gene_fusion
  • If both unidirectional and bidirectional ones are detected, we label it as gene_fusion.

Gene Fusions Section

The geneFusions section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ENST00000396373.4, there 7 other Ensembl transcripts that would produce a gene fusion. For NM_001987.4, there was only one transcript (NM_001754.4) that produce a gene fusion.

For each originating transcript, we report the following for each partner transcript:

  • transcript ID
  • gene ID
  • HGNC gene symbol
  • transcript bio type (e.g. protein_coding)
  • intron or exon number containing the breakpoint
  • HGVS RNA notation
  • gene fusion directionality
tip

Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see HGVS SVD-WG007).

          "geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],

The HGVS RNA notation above indicates that the gene fusion starts with NM_001754.4 (RUNX1) until CDS position 58 and continues with NM_001987.4 (ETV6). 1009+3367 indicates that the fusion occurred 3367 bp within intron 2.

- - + + \ No newline at end of file diff --git a/3.23/core-functionality/transcript-consequence-impacts/index.html b/3.23/core-functionality/transcript-consequence-impacts/index.html index 9a1093dc..3878db8f 100644 --- a/3.23/core-functionality/transcript-consequence-impacts/index.html +++ b/3.23/core-functionality/transcript-consequence-impacts/index.html @@ -6,14 +6,14 @@ Transcript Consequence Impact | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. +

Version: 3.23

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. However, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided.

Example Transcript

The key impact for each transcript gives the impact rating for the consequence.

{
"variants": [
{
"vid": "1-1623412-T-C",
"chromosome": "1",
"begin": 1623412,
"end": 1623412,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.1623412T>C",
"transcripts": [
{
"transcript": "ENST00000479659.5",
"source": "Ensembl",
"bioType": "lncRNA",
"introns": "2/18",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"intron_variant",
"non_coding_transcript_variant"
],
"impact": "modifier",
"hgvsc": "ENST00000479659.5:n.288-19T>C"
},
{
"transcript": "ENST00000489635.5",
"source": "VEP",
"bioType": "mRNA",
"codons": "aTg/aCg",
"aminoAcids": "M/T",
"cdnaPos": "269",
"cdsPos": "134",
"exons": "3/20",
"proteinPos": "45",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"missense_variant"
],
"impact": "moderate",
"hgvsc": "ENST00000489635.5:c.134T>C",
"hgvsp": "ENSP00000426007.1:p.(Met45Thr)",
"proteinId": "ENSP00000426007.1"
}
]
}
]
}
- - + + \ No newline at end of file diff --git a/3.23/core-functionality/variant-ids/index.html b/3.23/core-functionality/variant-ids/index.html index 71736797..2cc45a5f 100644 --- a/3.23/core-functionality/variant-ids/index.html +++ b/3.23/core-functionality/variant-ids/index.html @@ -6,13 +6,13 @@ Variant IDs | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
- - +
Version: 3.23

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
+ + \ No newline at end of file diff --git a/3.23/data-sources/1000Genomes-snv-json/index.html b/3.23/data-sources/1000Genomes-snv-json/index.html index bc0dbc06..5946e84e 100644 --- a/3.23/data-sources/1000Genomes-snv-json/index.html +++ b/3.23/data-sources/1000Genomes-snv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-snv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
- - +
Version: 3.23

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
+ + \ No newline at end of file diff --git a/3.23/data-sources/1000Genomes-sv-json/index.html b/3.23/data-sources/1000Genomes-sv-json/index.html index f1b4717f..c5d19d54 100644 --- a/3.23/data-sources/1000Genomes-sv-json/index.html +++ b/3.23/data-sources/1000Genomes-sv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-sv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - +
Version: 3.23

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
+ + \ No newline at end of file diff --git a/3.23/data-sources/1000Genomes/index.html b/3.23/data-sources/1000Genomes/index.html index 334a5288..aa4bc185 100644 --- a/3.23/data-sources/1000Genomes/index.html +++ b/3.23/data-sources/1000Genomes/index.html @@ -6,15 +6,15 @@ 1000 Genomes | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 +

Version: 3.23

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 GRCh38

JSON Output

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

Structural Variants

VCF File Parsing

The VCF files contain entries like the following:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A <CN0>,<CN2>,<CN3>,<CN4> 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4

Please note that, CNVs are allele-specific. For example, HG00096 is effectively copy number 4, which would be a net gain on chr22.

1000 Genomes contains 5 types of structural variants:

  • CNV
  • DEL
  • DUP
  • INS
  • INV

Since data of 1000 genomes is provided in VCF format, we assume that the coordinates follow the vcf format, i.e., there is a padding base for symbolic alleles. So all the interval can be interpreted as [BEGIN+1, END]. Similarly, for all other variant types except insertion, END is far larger than BEGIN. The distribution of BEGIN and END for insertions is summarized below.

Insertion issues

  • END = BEGIN for 6/165
  • END = BEGIN+2 for 93/165
  • END = BEGIN+3 for 11/165
  • END = BEGIN+4 for 11/165
  • END – BEGIN range from 5 to 1156 for others.

Converting VCF svTypes to SO sequence alterations

The svType will be captured in our JSON file under the sequenceAlteration key. Here's the translation we'll use according to svType in 1000 Genomes.

svTypeAlternative Alleles contain <CN*>sequenceAlteration
ALUFALSEmobile_element_insertion
DUPTRUEcopy_number_gain
CNVTRUEcopy_number_gain (observed_gains >0 and observed_losses =0)
copy_number_loss (observed_gains = 0 and observed_losses > 0)
copy_number_variation (otherwise)
DELTRUEcopy_number_loss
LINE1FALSEmobile_element_insertion
SVAFALSEmobile_element_insertion
INVFALSEinversion
INSFALSEinsertion

Exceptions

We discard structural variants without END

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
21 9495848 esv3646347 A <INS:ME:LINE1> 100 PASS AC=1543;AF=0.308107;AN=5008;CS=L1_umary;MEINFO=LINE1,5669,6005,+;NS=2504;SVLEN=336;SVTYPE=LINE1;TSD=null;DP=20015;EAS_AF=0.3125;AMR_AF=0.2911;AFR_AF=0.3026;EUR_AF=0.2922;SAS_AF=0.3395;VT=SV GT 0|0 1|1 1|0 0|1 1|0 1|0 0|0

CNVs in chrY

  • No other types of structural variants exist in chrY
  • Since copy number is provided in genotype field, we directly parse the copy number from "CN" field.
  • For most CNVs in chrY, the reference copy number is 1, but the refence number for CNVs in segmental duplication sites is 2 (<CN2> in the 2nd example). All segmental duplication calls have identifiers starting with GS_SD_M2.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00101 HG00103 HG00105 HG00107 HG00108
Y 2888555 CNV_Y_2888555_3014661 T <CN2> 100 PASS AC=1;AF=0.000817661;AN=1223;END=3014661;NS=1233;SVTYPE=CNV;AMR_AF=0.0000;AFR_AF=0.0000;EUR_AF=0.0000;SAS_AF=0.0019;EAS_AF=0.0000;VT=SV GT:CN:CNL:CNP:CNQ:GP:GQ:PL 0:1:-1000,0,-58.45:-1000,0,-61.55:99:0,-61.55:99:0,585 0:1:-296.36,0,-16.6:-300.46,0,-19.7:99:0,-19.7:99:0,166 0:1:-1000,0,-39.44:-1000,0,-42.54:99:0,-42.54:99:0,394
Y 6128381 GS_SD_M2_Y_6128381_6230094_Y_9650284_9752225 C <CN1>,<CN3> 100 PASS AC=4,2;AF=0.00327065,0.00163532;AN=1223;END=6230094;NS=1233;SVTYPE=CNV;AMR_AF=0.0029,0.0029;AFR_AF=0.0016,0.0016;EUR_AF=0.0000,0.0000;SAS_AF=0.0038,0.0000;EAS_AF=0.0000,0.0000;VT=SV;EX_TARGET GT:CN:CNL:CNP:CNQ:GP:GQ 0:2:-1000,-138.78,0,-38.53:-1000,-141.27,0,-41.33:99:0,-141.27,-41.33:99 0:2:-1000,-53.32,0,-17.85:-1000,-55.81,0,-20.64:99:0,-55.81,-20.64:99 0:2:-1000,-71.83,0,-32.5:-1000,-74.32,0,-35.29:99:0,-74.32,-35.29:99 0:2:-1000,-60.96,0,-20.29:-1000,-63.45,0,-23.08:99:0,-63.45,-23.08:99 0:2:-1000,-77.6,0,-31.45:-1000,-80.09,0,-34.24:99:0,-80.09,-34.24:99

JSON Output

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - + + \ No newline at end of file diff --git a/3.23/data-sources/amino-acid-conservation-json/index.html b/3.23/data-sources/amino-acid-conservation-json/index.html index c68d758f..5f8a76be 100644 --- a/3.23/data-sources/amino-acid-conservation-json/index.html +++ b/3.23/data-sources/amino-acid-conservation-json/index.html @@ -6,13 +6,13 @@ amino-acid-conservation-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - +
Version: 3.23

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
+ + \ No newline at end of file diff --git a/3.23/data-sources/amino-acid-conservation/index.html b/3.23/data-sources/amino-acid-conservation/index.html index 72b5bf53..623063d9 100644 --- a/3.23/data-sources/amino-acid-conservation/index.html +++ b/3.23/data-sources/amino-acid-conservation/index.html @@ -6,14 +6,14 @@ Amino Acid Conservation | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. +

Version: 3.23

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. For position 6, we would say that we have 43% conservation (3/7) since three organisms share the same residue as humans.

Assigning scores to Illumina Connected Annotations transcripts

The source FASTA file comes with Ensembl/UCSC transcript ids of the transcripts used for alignments. The Illumina Connected Annotations cache has RefSeq and Ensembl transcripts and our first attempt was to map the given Ensembl/UCSC ids to their equivalent RefSeq/Ensembl ids. This attempt was unsuccessful since UCSC Table Browser provided mapping without version numbers. So we proceeded as follows:

  • Take proteins which have a unique mapping (and hence one set of conservation scores). For ones that mapped to both ChrX and ChrY, we accepted the one from ChrX.
  • A Illumina Connected Annotations transcript having an exact peptide sequence match with a uniquely aligned protein is assigned the corresponding conservation scores.

Unfortunately this left us with a very small number of transcripts having conservation scores.

GRCh37

  • Source FASTA contained 41957 protein alignments.
  • 38165 proteins had unique scores.
  • 88 aligned proteins existed in Illumina Connected Annotations cache.
  • 118 transcripts had conservation scores.

GRCh38

  • Source FASTA contained 110024 protein alignments.
  • 88961 proteins had unique scores.
  • 11688 aligned proteins existed in Illumina Connected Annotations cache.
  • 12098 transcripts had conservation scores.

Download URL

GRCh37: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz

GRCh38: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz

JSON Output

Conservation scores are reported in the transcript section. One score is reported for each alt allele

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - + + \ No newline at end of file diff --git a/3.23/data-sources/cancer-hotspots/index.html b/3.23/data-sources/cancer-hotspots/index.html index 629753dd..fc3d0241 100644 --- a/3.23/data-sources/cancer-hotspots/index.html +++ b/3.23/data-sources/cancer-hotspots/index.html @@ -6,14 +6,14 @@ Cancer Hotspots | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. +

Version: 3.23

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. Then we match each entry using these notations.

caution

We currently skip all variants labeled as splice from the source

JSON Output

The data source will be captured under the cancerHotspots key in the transcript section.

{
"transcript":"NM_002524.5",
"source":"RefSeq",
"bioType":"mRNA",
"aminoAcids":"Q/K",
"proteinPos":"61",
"geneId":"4893",
"hgnc":"NRAS",
"hgvsc":"NM_002524.5:c.181C>A",
"hgvsp":"NP_002515.1:p.(Gln61Lys)",
"isCanonical":true,
"proteinId":"NP_002515.1",
"cancerHotspots":{
"residue":"Q61",
"numSamples":422,
"numAltAminoAcidSamples":142,
"qValue":0
}
}
FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble
- - + + \ No newline at end of file diff --git a/3.23/data-sources/clingen-dosage-json/index.html b/3.23/data-sources/clingen-dosage-json/index.html index 10f96481..010ef028 100644 --- a/3.23/data-sources/clingen-dosage-json/index.html +++ b/3.23/data-sources/clingen-dosage-json/index.html @@ -6,13 +6,13 @@ clingen-dosage-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
- - +
Version: 3.23

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
+ + \ No newline at end of file diff --git a/3.23/data-sources/clingen-gene-validity-json/index.html b/3.23/data-sources/clingen-gene-validity-json/index.html index d238e6e9..768ac3eb 100644 --- a/3.23/data-sources/clingen-gene-validity-json/index.html +++ b/3.23/data-sources/clingen-gene-validity-json/index.html @@ -6,13 +6,13 @@ clingen-gene-validity-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
- - +
Version: 3.23

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
+ + \ No newline at end of file diff --git a/3.23/data-sources/clingen-json/index.html b/3.23/data-sources/clingen-json/index.html index 5466f06c..b7fa0ac8 100644 --- a/3.23/data-sources/clingen-json/index.html +++ b/3.23/data-sources/clingen-json/index.html @@ -6,13 +6,13 @@ clingen-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
- - +
Version: 3.23

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
+ + \ No newline at end of file diff --git a/3.23/data-sources/clingen/index.html b/3.23/data-sources/clingen/index.html index 4d23c05a..9511aaeb 100644 --- a/3.23/data-sources/clingen/index.html +++ b/3.23/data-sources/clingen/index.html @@ -6,13 +6,13 @@ ClinGen | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

- - +
Version: 3.23

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

+ + \ No newline at end of file diff --git a/3.23/data-sources/clinvar-json/index.html b/3.23/data-sources/clinvar-json/index.html index 65c92d0e..05b892a4 100644 --- a/3.23/data-sources/clinvar-json/index.html +++ b/3.23/data-sources/clinvar-json/index.html @@ -6,13 +6,13 @@ clinvar-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
- - +
Version: 3.23

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
+ + \ No newline at end of file diff --git a/3.23/data-sources/clinvar/index.html b/3.23/data-sources/clinvar/index.html index 19df2b31..ed6c8a61 100644 --- a/3.23/data-sources/clinvar/index.html +++ b/3.23/data-sources/clinvar/index.html @@ -6,15 +6,15 @@ ClinVar | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

ClinVar

Overview

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

RCV File

Example

Here's a full RCV entry.

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

ID

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinVarAccession Acc="RCV000000001" Version="2">
</ClinVarSet>

The Acc and Version fields are merged to form the ID (RCV000000001.2)

LastUpdatedDate

<ClinVarSet>
<ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
</ClinVarSet>

Significance

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

ReviewStatus

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

Phenotypes

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="62">
<Trait Type="Disease">
<Name>
<ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
</Name>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

We only use the field with Type="Preferred". Multiple phenotypes may be reported

Location, Variant Type and Variant Id

<ReferenceClinVarAssertion>
<GenotypeSet Type="CompoundHeterozygote" ID="424709">
<MeasureSet Type="Variant" ID="81">
<Measure Type="single nucleotide variant" ID="15120">
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
</Measure>
</MeasureSet>
</GenotypeSet>
</ReferenceClinVarAssertion>
  • The variant position is extracted from the fields for their respective assemblies.
  • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
  • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
  • If a required allele is not available, we extract it from the reference sequence.
  • Only variants having a dbSNP id are extracted.
  • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
  • VariantId is extracted from the MeasureSet attributes.
  • VariantType is extracted from the Measure attributes.
    unsupported variant types

    We currently don't support the following variant types:

    • Microsatellite
    • protein only
    • fusion
    • Complex
    • Variation
    • Translocation

MedGen, OMIM, Orphanet IDs

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="175">
<Trait ID="3036" Type="Disease">
<XRef ID="C0086651" DB="MedGen"/>
<XRef ID="309297" DB="Orphanet"/>
<XRef ID="582" DB="Orphanet"/>
<XRef Type="MIM" ID="253000" DB="OMIM"/>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

AlleleOrigins

<ClinVarAssertion>
<Origin>germline</Origin>
</ClinVarAssertion>

We only extract all Allele Origins from Submissions (SCV) entries.

PubMedIds

<ClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<Citation Type="general">
<ID Source="PubMed">12114475</ID>
</Citation>
</ClinicalSignificance>
<AttributeSet>
<Attribute Type="AssertionMethod">LMM Criteria</Attribute>
<Citation>
<ID Source="PubMed">24033266</ID>
</Citation>
</AttributeSet>
<ObservedIn>
<ObservedData ID="9727445">
<Citation Type="general">
<ID Source="PubMed">9113933</ID>
</Citation>
</ObservedData>
</ObservedIn>
<Citation Type="general">
<ID Source="PubMed">23757202</ID>
</Citation>
</ClinVarAssertion>

We only extract all Pubmed Ids from Submissions (SCV) entries.

Parsing Significance

Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2016-10-13">
<ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
<Description>Pathogenic/Likely pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2012-06-07">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Conflicting interpretations of pathogenicity</Description>
<Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
</ClinicalSignificance>

Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

Varying Delimiters

The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

VCV File

Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
<RecordStatus>current</RecordStatus>
<Species>Homo sapiens</Species>
<IncludedRecord>
<SimpleAllele AlleleID="425239" VariationID="431749">
<GeneList>
<Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
</Location>
<OMIM>601142</OMIM>
</Gene>
<Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
</Location>
<OMIM>607215</OMIM>
</Gene>
</GeneList>
<Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
<VariantType>copy number gain</VariantType>
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<XRefList>
<XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
</XRefList>
</SimpleAllele>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<SubmittedInterpretationList>
<SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
</SubmittedInterpretationList>
<InterpretedVariationList>
<InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
</InterpretedVariationList>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

id

<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

The Acc and Version fields are merged to form the ID (RCV000000001.2)

significance

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<SimpleAllele>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
</SimpleAllele>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

May have multiple significances listed.

reviewStatus

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Known Issues

Known Issues
  • The XML file contains ~1k more entries (out of 162K) than the VCF file
  • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
  • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", +
    Version: 3.23

    ClinVar

    Overview

    ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

    Publication

    Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

    RCV File

    Example

    Here's a full RCV entry.

    Parsing

    In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

    ID

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinVarAccession Acc="RCV000000001" Version="2">
    </ClinVarSet>

    The Acc and Version fields are merged to form the ID (RCV000000001.2)

    LastUpdatedDate

    <ClinVarSet>
    <ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
    </ClinVarSet>

    Significance

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>
    </ClinVarSet>

    ReviewStatus

    <ClinVarSet>
    <ReferenceClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>
    </ClinVarSet>

    Phenotypes

    <ReferenceClinVarAssertion>
    <TraitSet Type="Disease" ID="62">
    <Trait Type="Disease">
    <Name>
    <ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
    </Name>
    </Trait>
    </TraitSet>
    </ReferenceClinVarAssertion>

    We only use the field with Type="Preferred". Multiple phenotypes may be reported

    Location, Variant Type and Variant Id

    <ReferenceClinVarAssertion>
    <GenotypeSet Type="CompoundHeterozygote" ID="424709">
    <MeasureSet Type="Variant" ID="81">
    <Measure Type="single nucleotide variant" ID="15120">
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
    AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
    stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
    positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
    AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
    stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
    positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
    </Measure>
    </MeasureSet>
    </GenotypeSet>
    </ReferenceClinVarAssertion>
    • The variant position is extracted from the fields for their respective assemblies.
    • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
    • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
    • If a required allele is not available, we extract it from the reference sequence.
    • Only variants having a dbSNP id are extracted.
    • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
    • VariantId is extracted from the MeasureSet attributes.
    • VariantType is extracted from the Measure attributes.
      unsupported variant types

      We currently don't support the following variant types:

      • Microsatellite
      • protein only
      • fusion
      • Complex
      • Variation
      • Translocation

    MedGen, OMIM, Orphanet IDs

    <ReferenceClinVarAssertion>
    <TraitSet Type="Disease" ID="175">
    <Trait ID="3036" Type="Disease">
    <XRef ID="C0086651" DB="MedGen"/>
    <XRef ID="309297" DB="Orphanet"/>
    <XRef ID="582" DB="Orphanet"/>
    <XRef Type="MIM" ID="253000" DB="OMIM"/>
    </Trait>
    </TraitSet>
    </ReferenceClinVarAssertion>

    AlleleOrigins

    <ClinVarAssertion>
    <Origin>germline</Origin>
    </ClinVarAssertion>

    We only extract all Allele Origins from Submissions (SCV) entries.

    PubMedIds

    <ClinVarAssertion>
    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <Citation Type="general">
    <ID Source="PubMed">12114475</ID>
    </Citation>
    </ClinicalSignificance>
    <AttributeSet>
    <Attribute Type="AssertionMethod">LMM Criteria</Attribute>
    <Citation>
    <ID Source="PubMed">24033266</ID>
    </Citation>
    </AttributeSet>
    <ObservedIn>
    <ObservedData ID="9727445">
    <Citation Type="general">
    <ID Source="PubMed">9113933</ID>
    </Citation>
    </ObservedData>
    </ObservedIn>
    <Citation Type="general">
    <ID Source="PubMed">23757202</ID>
    </Citation>
    </ClinVarAssertion>

    We only extract all Pubmed Ids from Submissions (SCV) entries.

    Parsing Significance

    Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

    <ClinicalSignificance DateLastEvaluated="1996-04-01">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Pathogenic</Description>
    </ClinicalSignificance>

    <ClinicalSignificance DateLastEvaluated="2016-10-13">
    <ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
    <Description>Pathogenic/Likely pathogenic</Description>
    </ClinicalSignificance>

    <ClinicalSignificance DateLastEvaluated="2012-06-07">
    <ReviewStatus>no assertion criteria provided</ReviewStatus>
    <Description>Conflicting interpretations of pathogenicity</Description>
    <Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
    </ClinicalSignificance>

    Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

    Varying Delimiters

    The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

    VCV File

    Example

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
    <VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
    <RecordStatus>current</RecordStatus>
    <Species>Homo sapiens</Species>
    <IncludedRecord>
    <SimpleAllele AlleleID="425239" VariationID="431749">
    <GeneList>
    <Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
    </Location>
    <OMIM>601142</OMIM>
    </Gene>
    <Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
    </Location>
    <OMIM>607215</OMIM>
    </Gene>
    </GeneList>
    <Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
    <VariantType>copy number gain</VariantType>
    <Location>
    <CytogeneticLocation>1p36.31</CytogeneticLocation>
    <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    <XRefList>
    <XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
    </XRefList>
    </SimpleAllele>
    <ReviewStatus>no interpretation for the single variant</ReviewStatus>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    <SubmittedInterpretationList>
    <SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
    </SubmittedInterpretationList>
    <InterpretedVariationList>
    <InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
    </InterpretedVariationList>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    Parsing

    In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

    id

    <VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

    The Acc and Version fields are merged to form the ID (RCV000000001.2)

    significance

    <ClinVarVariationRelease>
    <VariationArchive>
    <IncludedRecord>
    <SimpleAllele>
    <Interpretations>
    <Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
    <Description>no interpretation for the single variant</Description>
    </Interpretation>
    </Interpretations>
    </SimpleAllele>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    May have multiple significances listed.

    reviewStatus

    <ClinVarVariationRelease>
    <VariationArchive>
    <IncludedRecord>
    <ReviewStatus>no interpretation for the single variant</ReviewStatus>
    </IncludedRecord>
    </VariationArchive>
    </ClinVarVariationRelease>

    Known Issues

    Known Issues
    • The XML file contains ~1k more entries (out of 162K) than the VCF file
    • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
    • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", etc.) as their alternate allele

    Download URLs

    ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz

    https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz

    JSON Output

    small variants:

    "clinvar":[
    {
    "id":"VCV000036581.3",
    "reviewStatus":"reviewed by expert panel",
    "significance":[
    "benign"
    ],
    "refAllele":"G",
    "altAllele":"A",
    "lastUpdatedDate":"2020-03-01",
    "isAlleleSpecific":true
    },
    {
    "id":"RCV000030258.4",
    "variationId":"VCV000036581.3",
    "reviewStatus":"reviewed by expert panel",
    "alleleOrigins":[
    "germline"
    ],
    "refAllele":"G",
    "altAllele":"A",
    "phenotypes":[
    "Lynch syndrome"
    ],
    "medGenIds":[
    "C1333990"
    ],
    "omimIds":[
    "120435"
    ],
    "significance":[
    "benign"
    ],
    "lastUpdatedDate":"2017-05-01",
    "isAlleleSpecific":true
    }
    ]

    large variants:

    "clinvar":[
    {
    "chromosome":"1",
    "begin":629025,
    "end":8537745,
    "variantType":"copy_number_loss",
    "id":"RCV000051993.4",
    "variationId":"VCV000058242.1",
    "reviewStatus":"criteria provided, single submitter",
    "alleleOrigins":[
    "not provided"
    ],
    "phenotypes":[
    "See cases"
    ],
    "significance":[
    "pathogenic"
    ],
    "lastUpdatedDate":"2022-04-21",
    "pubMedIds":[
    "21844811"
    ]
    },
    {
    "id":"VCV000058242.1",
    "reviewStatus":"criteria provided, single submitter",
    "significance":[
    "pathogenic"
    ],
    "lastUpdatedDate":"2022-04-21"
    },
    ......
    ]
    FieldTypeNotes
    idstringClinVar ID
    variationIdstringClinVar VCV ID
    variantTypestringvariant type
    reviewStatusstringsee possible values below
    alleleOriginsstring arraysee possible values below
    refAllelestring
    altAllelestring
    phenotypesstring array
    medGenIdsstring arrayMedGen IDs
    omimIdsstring arrayOMIM IDs
    orphanetIdsstring arrayOrphanet IDs
    significancestring arraysee possible values below
    lastUpdatedDatestringyyyy-MM-dd
    pubMedIdsstring arrayPubMed IDs
    isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

    reviewStatus:

    • no assertion provided
    • no assertion criteria provided
    • criteria provided, single submitter
    • practice guideline
    • classified by multiple submitters
    • criteria provided, conflicting interpretations
    • criteria provided, multiple submitters, no conflicts
    • no interpretation for the single variant

    alleleOrigins:

    • unknown
    • other
    • germline
    • somatic
    • inherited
    • paternal
    • maternal
    • de-novo
    • biparental
    • uniparental
    • not-tested
    • tested-inconclusive

    significance:

    • uncertain significance
    • not provided
    • benign
    • likely benign
    • likely pathogenic
    • pathogenic
    • drug response
    • histocompatibility
    • association
    • risk factor
    • protective
    • affects
    • conflicting data from submitters
    • other
    • no interpretation for the single variant
    • conflicting interpretations of pathogenicity

    Building the supplementary files

    There are 2 ways of building your own OMIM supplementary files using SAUtils.

    The first way is to use SAUtils command's subcommands clinvar. The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

    The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

    Using clinvar subcommands and source data files

    Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

    ClinVarFullRelease_00-latest.xml.gz     ClinVarVariationRelease_00-latest.xml.gz
    ClinVarFullRelease_00-latest.xml.gz.version

    The version file is a json file with the following format.

    {
    "name": "ClinVar",
    "version": "20231230",
    "description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
    "releaseDate": "2024-01-10"
    }

    You have to adjust the version and release date according to the actual date of the ClinVar.

    The help menu for the utility is as follows:

    dotnet SAUtils.dll clinvar
    ---------------------------------------------------------------------------
    SAUtils (c) 2022 Illumina, Inc.
    Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
    ---------------------------------------------------------------------------

    USAGE: dotnet SAUtils.dll clinvar [options]
    Creates a supplementary database with ClinVar annotations

    OPTIONS:
    --ref, -r <VALUE> compressed reference sequence file
    --rcv, -i <VALUE> ClinVar Full release XML file
    --vcv, -c <VALUE> ClinVar Variation release XML file
    --out, -o <VALUE> output directory
    --help, -h displays the help menu
    --version, -v displays the version

    dotnet SAUtils.dll clinvar

    Here is a sample execution:

    dotnet SAUtils.dll clinvar \\
    --ref ~/development/References/7/Homo_sapiens.GRCh38.Nirvana.dat --rcv ClinVarFullRelease_00-latest.xml.gz \\
    --vcv ClinVarVariationRelease_00-latest.xml.gz --out ~/development/SupplementaryDatabase/63/GRCh38
    ---------------------------------------------------------------------------
    SAUtils (c) 2022 Illumina, Inc.
    Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
    ---------------------------------------------------------------------------

    Found 1535677 VCV records
    Unknown vcv id:225946 found in RCV000211201.2
    Unknown vcv id:225946 found in RCV000211253.2
    Unknown vcv id:225946 found in RCV000211375.2
    Unknown vcv id:976117 found in RCV001253316.1
    Unknown vcv id:1321016 found in RCV001776995.2
    3 unknown VCVs found in RCVs.
    225946,976117,1321016
    0 unknown VCVs found in RCVs.
    Chromosome 1 completed in 00:00:15.1
    Chromosome 2 completed in 00:00:20.0
    Chromosome 3 completed in 00:00:09.7
    Chromosome 4 completed in 00:00:05.9
    Chromosome 5 completed in 00:00:09.8
    Chromosome 6 completed in 00:00:08.3
    Chromosome 7 completed in 00:00:08.7
    Chromosome 8 completed in 00:00:06.2
    Chromosome 9 completed in 00:00:08.6
    Chromosome 10 completed in 00:00:07.0
    Chromosome 11 completed in 00:00:11.7
    Chromosome 12 completed in 00:00:08.0
    Chromosome 13 completed in 00:00:06.3
    Chromosome 14 completed in 00:00:06.0
    Chromosome 15 completed in 00:00:06.6
    Chromosome 16 completed in 00:00:10.8
    Chromosome 17 completed in 00:00:13.8
    Chromosome 18 completed in 00:00:02.9
    Chromosome 19 completed in 00:00:08.7
    Chromosome 20 completed in 00:00:03.6
    Chromosome 21 completed in 00:00:02.4
    Chromosome 22 completed in 00:00:03.6
    Chromosome MT completed in 00:00:00.2
    Chromosome X completed in 00:00:07.5
    Chromosome Y completed in 00:00:00.0
    Maximum bp shifted for any variant:2
    Writing 37097 intervals to database...

    Time: 00:13:26.9

- - + + \ No newline at end of file diff --git a/3.23/data-sources/cosmic-cancer-gene-census/index.html b/3.23/data-sources/cosmic-cancer-gene-census/index.html index 03769968..e85bd942 100644 --- a/3.23/data-sources/cosmic-cancer-gene-census/index.html +++ b/3.23/data-sources/cosmic-cancer-gene-census/index.html @@ -6,13 +6,13 @@ cosmic-cancer-gene-census | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
- - +
Version: 3.23

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + \ No newline at end of file diff --git a/3.23/data-sources/cosmic-gene-fusion-json/index.html b/3.23/data-sources/cosmic-gene-fusion-json/index.html index 3c65fec1..59133069 100644 --- a/3.23/data-sources/cosmic-gene-fusion-json/index.html +++ b/3.23/data-sources/cosmic-gene-fusion-json/index.html @@ -6,13 +6,13 @@ cosmic-gene-fusion-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.23

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/3.23/data-sources/cosmic-json/index.html b/3.23/data-sources/cosmic-json/index.html index af471a3e..8249c7f4 100644 --- a/3.23/data-sources/cosmic-json/index.html +++ b/3.23/data-sources/cosmic-json/index.html @@ -6,13 +6,13 @@ cosmic-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.23

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/3.23/data-sources/cosmic/index.html b/3.23/data-sources/cosmic/index.html index 6aa12f3d..6f8495ef 100644 --- a/3.23/data-sources/cosmic/index.html +++ b/3.23/data-sources/cosmic/index.html @@ -6,12 +6,12 @@ COSMIC | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human +

Version: 3.23

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human cancers.

Publication

John G Tate, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, Charlotte G Cole, Celestino Creatore, Elisabeth Dawson, Peter Fish, Bhavana Harsha, Charlie Hathaway, Steve C Jupe, Chai Yin Kok, Kate Noble, Laura Ponting, Christopher C Ramshaw, Claire E Rye, Helen E Speedy, Ray Stefancsik, Sam L Thompson, Shicai Wang, Sari Ward, Peter J Campbell, Simon A Forbes. (2019) COSMIC: the Catalogue Of Somatic Mutations In @@ -22,7 +22,7 @@ pair when it is released in the database. Currently COSMIC includes information on fusions involved in solid tumours and leukaemias.

TSV extraction

Example

SAMPLE_ID SAMPLE_NAME PRIMARY_SITE  SITE_SUBTYPE_1  SITE_SUBTYPE_2  SITE_SUBTYPE_3  PRIMARY_HISTOLOGY HISTOLOGY_SUBTYPE_1 HISTOLOGY_SUBTYPE_2 HISTOLOGY_SUBTYPE_3 FUSION_ID TRANSLOCATION_NAME  5'_CHROMOSOME 5'_STRAND 5'_GENE_ID  5'_GENE_NAME  5'_LAST_OBSERVED_EXON 5'_GENOME_START_FROM  5'_GENOME_START_TO  5'_GENOME_STOP_FROM 5'_GENOME_STOP_TO 3'_CHROMOSOME 3'_STRAND 3'_GENE_ID  3'_GENE_NAME  3'_FIRST_OBSERVED_EXON  3'_GENOME_START_FROM  3'_GENOME_START_TO  3'_GENOME_STOP_FROM 3'_GENOME_STOP_TO FUSION_TYPE PUBMED_PMID
749711 HCC1187 breast NS NS NS carcinoma ductal_carcinoma NS NS 665 ENST00000360863.10(RGS22):r.1_3555::ENST00000369518.1(SYCP1):r.2100_3452 8 - 197199 RGS22 22 99981937 99981937 100106116 100106116 1 + 212470 SYCP1_ENST00000369518 24 114944339 114944339 114995367 114995367 Inferred Breakpoint 20033038

Parsing

From the TSV file, we're mainly interested in the following columns:

  • SAMPLE_ID
  • PRIMARY_SITE
  • PRIMARY_HISTOLOGY
  • HISTOLOGY_SUBTYPE_1
  • FUSION_ID
  • TRANSLOCATION_NAME
  • PUBMED_PMID
info

For all the histologies and sites, we replace all the underlines with spaces. salivary_gland would become salivary gland.

Parsing

To create the gene fusion entries in Illumina Connected Annotations, we perform the following on each row in the TSV file:

  • Group all entries by FUSION_ID
  • Using all the entries related to this FUSION_ID:
    • Collect all the PubMed IDs
    • Tally the number of observed sample IDs
    • Grab the HGVS r. notation (should not change throughout the FUSION_ID)
    • Tally the number of samples observed for each histology
    • Tally the number of samples observed for each site
  • Extract the transcript IDs from the HGVS notation and lookup the associated gene symbols

Aggregating Histologies & Sites

Aggregating Histologies & Sites was previously described in the small variants section.

Known Issues

Known Issues

There are some issues with the HGVS RNA notation:

  • For coding transcripts, HGVS numbering should use CDS coordinates. Right now COSMIC is using cDNA coordinates for all their fusions.

Download URL

GRCh37

GRCh38

JSON Output

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint

Cancer Gene Census

TSV Extraction

Example

GENE_NAME       CELL_TYPE       PUBMED_PMID     HALLMARK        IMPACT  DESCRIPTION     CELL_LINE
PRDM16 18496560 role in cancer oncogene oncogene
PRDM16 16015645 role in cancer fusion fusion

Parsing

To extract information about TSGs and oncogenes, the data based on the "role in cancer" attribute is filtered. For tumor suppressor genes, rows with the value "TSG" and for oncogenes, rows with the value "oncogene" are filtered. Some genes have both "TSG/oncogene" as their role, which indicates that they can act as both.

Columns

Only following columns are needed to gather required roles in cancer:

  • GENE_NAME
  • IMPACT
  • HALLMARK
Possible Roles in Cancer

The file contained following number of instances for each role type

Role in cancerTotal Instances
fusion149
TSG195
oncogene181
Total525

CSV Extraction

COSMIC Tiers are extracted from cancer_gene_census.csv file:

Gene Symbol,Name,Entrez GeneId,Genome Location,Tier,Hallmark,Chr Band,Somatic,Germline,Tumour Types(Somatic),Tumour Types(Germline),Cancer Syndrome,Tissue Type,Molecular Genetics,Role in Cancer,Mutation Types,Translocation Partner,Other Germline Mut,Other Syndrome,COSMIC ID,cosmic gene name,Synonyms
"AR","Androgen Receptor ","367","X:67544036-67730619","1","Yes","Xq12","yes","yes","prostate","","","E","Dom","oncogene","Mis","","yes ","Androgen insensitivity, Hypospadias 1, X-linked, Spinal and bulbar muscular atrophy of Kennedy ","COSG292497","AR","367,AIS,AR,DHTR,ENSG00000169083.16,HUMARA,NR3C4,P10275,SBMA,SMAX1"
"FH","fumarate hydratase","2271","1:241497603-241519761","1","","1q43","","yes","","leiomyomatosis, renal","hereditary leiomyomatosis and renal cell cancer","E, M","Rec","TSG","Mis, N, F","","","","COSG255037","FH","2271,ENSG00000091483.6,FH,P07954"
"ALK","anaplastic lymphoma kinase (Ki-1)","238","2:29192774-29921566","1","Yes","2p23.2","yes","yes","ALCL, NSCLC, neuroblastoma, inflammatory myofibroblastic tumour, Spitzoid tumour","neuroblastoma","familial neuroblastoma","L, E, M","Dom","oncogene, fusion","T, Mis, A","NPM1, TPM3, TFG, TPM4, ATIC, CLTC, MSN, RNF213, CARS, EML4, KIF5B, C2orf22, DCTN1, HIP1, TPR, RANBP2, PPFIBP1, SEC31A, STRN, VCL, C2orf44, KLC1","","","COSG383409","ALK","238,ALK,CD246,ENSG00000171094.17,Q9UM73"
"APC","adenomatous polyposis of the colon gene","324","5:112737888-112846239","1","Yes","5q22.2","yes","yes","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","adenomatous polyposis coli; Turcot syndrome","E, M, O","Rec","TSG","D, Mis, N, F, S","","","","COSG208824","APC","324,APC,DP2,DP2.5,DP3,ENSG00000134982.16,P25054,PPP1R46"
Columns

Only following columns are needed to gather required roles in cancer:

  • Gene Symbol
  • Tier

First the tiers are found from the CSV; based on gene symbols, the tiers' information is added while parsing through the TSV

Known Issues

None

Download URL

JSON output

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]

Building the supplementary files

You can generate COSMIC supplementary annotation files if you have COSMIC account credentials. Please refer to SAUtils section for more details.

- - + + \ No newline at end of file diff --git a/3.23/data-sources/dann-json/index.html b/3.23/data-sources/dann-json/index.html index 683055d1..50297cf8 100644 --- a/3.23/data-sources/dann-json/index.html +++ b/3.23/data-sources/dann-json/index.html @@ -6,13 +6,13 @@ dann-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - +
Version: 3.23

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.23/data-sources/dann/index.html b/3.23/data-sources/dann/index.html index 48960474..8c3cbc93 100644 --- a/3.23/data-sources/dann/index.html +++ b/3.23/data-sources/dann/index.html @@ -6,16 +6,16 @@ DANN | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). +

Version: 3.23

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). CADD is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. DANN improves on CADD (which uses Support Vector Machines (SVMs)) by capturing non-linear relationships by using a deep neural network instead of SVMs. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD’s SVM methodology.

Publication

Quang, Daniel, Yifei Chen, and Xiaohui Xie. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31.5 761-763 (2015). https://doi.org/10.1093/bioinformatics/btu703

TSV File

Example

chr     grch37_pos  ref     alt     DANN
1 10001 T A 0.16461391399220135
1 10001 T C 0.4396994049749739
1 10001 T G 0.38108629377072734
1 10002 A C 0.36182020272810128
1 10002 A G 0.44413258111779291
1 10002 A T 0.16812846819989813

Parsing

From the CSV file, we are interested in all columns:

  • chr
  • grch37_pos
  • ref
  • alt
  • DANN

GRCh38 liftover

The data is not available for GRCh38 on DANN website. We performed a liftover from GRCh37 to GRCh38 using crossmap.

Known Issues

None

Download URL

https://cbcl.ics.uci.edu/public_data/DANN/

JSON Output

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - + + \ No newline at end of file diff --git a/3.23/data-sources/dbsnp-json/index.html b/3.23/data-sources/dbsnp-json/index.html index 3c5b59c4..b2771b66 100644 --- a/3.23/data-sources/dbsnp-json/index.html +++ b/3.23/data-sources/dbsnp-json/index.html @@ -6,13 +6,13 @@ dbsnp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
- - +
Version: 3.23

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
+ + \ No newline at end of file diff --git a/3.23/data-sources/dbsnp/index.html b/3.23/data-sources/dbsnp/index.html index 275c0e87..d11ec35b 100644 --- a/3.23/data-sources/dbsnp/index.html +++ b/3.23/data-sources/dbsnp/index.html @@ -6,13 +6,13 @@ dbSNP | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

Building the supplementary files

You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details.

- - +
Version: 3.23

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

Building the supplementary files

You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details.

+ + \ No newline at end of file diff --git a/3.23/data-sources/decipher-json/index.html b/3.23/data-sources/decipher-json/index.html index 6bbd7ab5..ffd1cc0c 100644 --- a/3.23/data-sources/decipher-json/index.html +++ b/3.23/data-sources/decipher-json/index.html @@ -6,13 +6,13 @@ decipher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - +
Version: 3.23

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
+ + \ No newline at end of file diff --git a/3.23/data-sources/decipher/index.html b/3.23/data-sources/decipher/index.html index 478ad94c..a151e958 100644 --- a/3.23/data-sources/decipher/index.html +++ b/3.23/data-sources/decipher/index.html @@ -6,14 +6,14 @@ DECIPHER | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz +

Version: 3.23

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz

JSON output

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - + + \ No newline at end of file diff --git a/3.23/data-sources/fusioncatcher-json/index.html b/3.23/data-sources/fusioncatcher-json/index.html index c0446612..27f3c46e 100644 --- a/3.23/data-sources/fusioncatcher-json/index.html +++ b/3.23/data-sources/fusioncatcher-json/index.html @@ -6,13 +6,13 @@ fusioncatcher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.23

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/3.23/data-sources/fusioncatcher/index.html b/3.23/data-sources/fusioncatcher/index.html index 473d76ff..138257cb 100644 --- a/3.23/data-sources/fusioncatcher/index.html +++ b/3.23/data-sources/fusioncatcher/index.html @@ -6,13 +6,13 @@ FusionCatcher | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.23

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/3.23/data-sources/gerp-json/index.html b/3.23/data-sources/gerp-json/index.html index 7ec85008..6198d0f8 100644 --- a/3.23/data-sources/gerp-json/index.html +++ b/3.23/data-sources/gerp-json/index.html @@ -6,13 +6,13 @@ gerp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - +
Version: 3.23

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
+ + \ No newline at end of file diff --git a/3.23/data-sources/gerp/index.html b/3.23/data-sources/gerp/index.html index e3d94f86..50790c16 100644 --- a/3.23/data-sources/gerp/index.html +++ b/3.23/data-sources/gerp/index.html @@ -6,15 +6,15 @@ GERP | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. +

Version: 3.23

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint (Rejected Substitutions). Illumina Connected Annotations uses GERP++ which is based on a significantly faster and more statistically robust maximum likelihood estimation procedure to compute expected rates of evolution.

Publication

Davydov, Eugene V., et al. "Identifying a high fraction of the human genome to be under selective constraint using GERP++." PLoS computational biology 6.12 e1001025 (2010). https://doi.org/10.1371/journal.pcbi.1001025

Source Files

Example GRCh37

GRCh37 file is a TSV format

chr     position    GERP
1 12177 0.83
1 12178 -0.206
1 12179 -0.492
1 12180 -1.66
1 12181 0.83
1 12182 0.83
1 12183 -0.417
1 12184 0.83

Example GRCh38

GRCh38 file is a lift-over BED format

chr     pos_start   pos_end     GERP
1 12646 12647 0.298
1 12647 12648 2.63
1 12648 12649 1.87
1 12649 12650 0.252
1 12650 12651 -2.06
1 12651 12652 2.61
1 12652 12653 3.97

Parsing

From the CSV file, we are interested in columns:

  • chr
  • position
  • GERP

Known Issues

None

Download URL

GRCh37

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html

GRCh38

The data is not available for GRCh38 on GERP++ website, and was obtained from https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/

JSON Output

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - + + \ No newline at end of file diff --git a/3.23/data-sources/gme-json/index.html b/3.23/data-sources/gme-json/index.html index 9fcfa616..10d2b989 100644 --- a/3.23/data-sources/gme-json/index.html +++ b/3.23/data-sources/gme-json/index.html @@ -6,13 +6,13 @@ gme-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.23

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.23/data-sources/gme/index.html b/3.23/data-sources/gme/index.html index 1ea964df..ee4c114f 100644 --- a/3.23/data-sources/gme/index.html +++ b/3.23/data-sources/gme/index.html @@ -6,13 +6,13 @@ GME Variome | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.23

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.23/data-sources/gnomad-lof-json/index.html b/3.23/data-sources/gnomad-lof-json/index.html index dad954ca..0afcdb55 100644 --- a/3.23/data-sources/gnomad-lof-json/index.html +++ b/3.23/data-sources/gnomad-lof-json/index.html @@ -6,13 +6,13 @@ gnomad-lof-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
- - +
Version: 3.23

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
+ + \ No newline at end of file diff --git a/3.23/data-sources/gnomad-small-variants-json/index.html b/3.23/data-sources/gnomad-small-variants-json/index.html index b37afdc4..328a0f94 100644 --- a/3.23/data-sources/gnomad-small-variants-json/index.html +++ b/3.23/data-sources/gnomad-small-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
- - +
Version: 3.23

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
+ + \ No newline at end of file diff --git a/3.23/data-sources/gnomad-structural-variants-data_description/index.html b/3.23/data-sources/gnomad-structural-variants-data_description/index.html index f4625cac..de1ea5ce 100644 --- a/3.23/data-sources/gnomad-structural-variants-data_description/index.html +++ b/3.23/data-sources/gnomad-structural-variants-data_description/index.html @@ -6,14 +6,14 @@ gnomad-structural-variants-data_description | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +

Version: 3.23

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type KeyGRCh38 Source SV Type Key
copy_number_variationcopy number variation
deletionDEL, CN=0deletion
duplicationDUPduplication
insertionINSinsertion
inversionINVinversion
mobile_element_insertionINS:MEmobile element insertion
mobile_element_insertionINS:ME:ALUalu insertion
mobile_element_insertionINS:ME:LINE1line1 insertion
mobile_element_insertionINS:ME:SVAsva insertion
structural alterationsequence alteration
complex_structural_alterationCPX
- - + + \ No newline at end of file diff --git a/3.23/data-sources/gnomad-structural-variants-json/index.html b/3.23/data-sources/gnomad-structural-variants-json/index.html index 41a43bb0..11129fbb 100644 --- a/3.23/data-sources/gnomad-structural-variants-json/index.html +++ b/3.23/data-sources/gnomad-structural-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - +
Version: 3.23

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
+ + \ No newline at end of file diff --git a/3.23/data-sources/gnomad/index.html b/3.23/data-sources/gnomad/index.html index c788810f..55455a77 100644 --- a/3.23/data-sources/gnomad/index.html +++ b/3.23/data-sources/gnomad/index.html @@ -6,17 +6,17 @@ gnomAD | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note +

Version: 3.23

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note The gnomAD structural variant annotations are in a preview stage at the moment. Currently, the annotations do not include translocation breakends. Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type KeyGRCh38 Source SV Type Key
copy_number_variationcopy number variation
deletionDEL, CN=0deletion
duplicationDUPduplication
insertionINSinsertion
inversionINVinversion
mobile_element_insertionINS:MEmobile element insertion
mobile_element_insertionINS:ME:ALUalu insertion
mobile_element_insertionINS:ME:LINE1line1 insertion
mobile_element_insertionINS:ME:SVAsva insertion
structural alterationsequence alteration
complex_structural_alterationCPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

GRCh38

Note: The data was unavailable from gnomAD 2.1 original source, however the lifted over structural variant dataset was created by dbVar and was obtained from them https://www.ncbi.nlm.nih.gov/sites/dbvarapp/studies/nstd166/.

Download URL

https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/tsv/nstd166.GRCh38.variant_call.tsv.gz

JSON output

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - + + \ No newline at end of file diff --git a/3.23/data-sources/mito-heteroplasmy/index.html b/3.23/data-sources/mito-heteroplasmy/index.html index b3852c65..ccd3194b 100644 --- a/3.23/data-sources/mito-heteroplasmy/index.html +++ b/3.23/data-sources/mito-heteroplasmy/index.html @@ -6,13 +6,13 @@ Mitochondrial Heteroplasmy | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
- - +
Version: 3.23

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
+ + \ No newline at end of file diff --git a/3.23/data-sources/mitomap-small-variants-json/index.html b/3.23/data-sources/mitomap-small-variants-json/index.html index ecb3b6d9..8bbaff69 100644 --- a/3.23/data-sources/mitomap-small-variants-json/index.html +++ b/3.23/data-sources/mitomap-small-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
- - +
Version: 3.23

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
+ + \ No newline at end of file diff --git a/3.23/data-sources/mitomap-structural-variants-json/index.html b/3.23/data-sources/mitomap-structural-variants-json/index.html index 3493d1fb..76ad9df5 100644 --- a/3.23/data-sources/mitomap-structural-variants-json/index.html +++ b/3.23/data-sources/mitomap-structural-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.23

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/3.23/data-sources/mitomap/index.html b/3.23/data-sources/mitomap/index.html index da4b4196..0601669a 100644 --- a/3.23/data-sources/mitomap/index.html +++ b/3.23/data-sources/mitomap/index.html @@ -6,13 +6,13 @@ MITOMAP | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.23

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/3.23/data-sources/omim-json/index.html b/3.23/data-sources/omim-json/index.html index b5c8a449..ec044ad5 100644 --- a/3.23/data-sources/omim-json/index.html +++ b/3.23/data-sources/omim-json/index.html @@ -6,13 +6,13 @@ omim-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
- - +
Version: 3.23

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
+ + \ No newline at end of file diff --git a/3.23/data-sources/omim/index.html b/3.23/data-sources/omim/index.html index 44d98818..840e121d 100644 --- a/3.23/data-sources/omim/index.html +++ b/3.23/data-sources/omim/index.html @@ -6,18 +6,18 @@ OMIM | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
+

Version: 3.23

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
2 to disease phenotype itself was mapped
3 to molecular basis of the disorder is known
4 to disorder is a chromosome deletion or duplication syndrome

Phenotype character to comment

? to unconfirmed or possibly spurious mapping
[/] to nondiseases
{/} to contribute to susceptibility to multifactorial disorders or to susceptibility to infection

There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:

The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).

As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:

  • Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.
  • Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".
  • All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".
  • If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".

Here is a list of examples about how the description section supposed to be processed:

Original textProcessed text
({516030}, {516040}, and {516050})
(e.g., D1, {168461}; D2, {123833}; D3, {123834})(e.g., D1; D2; D3)
(desmocollins; see DSC2, {125645})(desmocollins; see DSC2)
(e.g., see {102700}, {300755})
(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650})(ADH). See also liver mitochondrial ALDH2
(see, e.g., CACNA1A; {601011})(see, e.g., CACNA1A)
(e.g., GSTA1; {138359}), mu (e.g., {138350})(e.g., GSTA1), mu
(NFKB; see {164011})(NFKB)
(see ISGF3G, {147574})(see ISGF3G)
(DCK; {EC 2.7.1.74}; {125450})(DCK; EC 2.7.1.74)

JSON output

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands downloadOMIM and omim.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using subcommands downloadOMIM and omim

The first step in builing the OMIM .nga files is to use the SAUtils command's subcommand downloadOMIM to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable OmimApiKey.

export OmimApiKey=<users-omim-api-key>
SAUtils.dll downloadOMIM
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll downloadomim [options]
Download the OMIM gene annotation data

OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--in, -i <path> input configuration JSON path (optional)
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14

---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

Gene Symbol Update Statistics
============================================
{
"NumGeneSymbolsUpToDate": 16978,
"NumGeneSymbolsUpdated": 60,
"NumGenesWhereBothIdsAreNull": 0,
"NumGeneSymbolsNotInCache": 105,
"NumUnresolvedGeneSymbolConflicts": 0
}

Once the download has succeeded, the nga files can be produced using the SAUtils command's subcommand omim.

dotnet SAUtils.dll omim
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll omim [options]
Creates a gene annotation database from OMIM data

OPTIONS:
--m2g, -m <VALUE> MimToGeneSymbol tsv file
--json, -j <VALUE> OMIM entry json file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version


dotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:04.5
- - + + \ No newline at end of file diff --git a/3.23/data-sources/phylop-json/index.html b/3.23/data-sources/phylop-json/index.html index 9d2f8466..1900d9d0 100644 --- a/3.23/data-sources/phylop-json/index.html +++ b/3.23/data-sources/phylop-json/index.html @@ -6,13 +6,13 @@ phylop-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - +
Version: 3.23

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + \ No newline at end of file diff --git a/3.23/data-sources/phylop/index.html b/3.23/data-sources/phylop/index.html index 6b3e94f3..87e13c6a 100644 --- a/3.23/data-sources/phylop/index.html +++ b/3.23/data-sources/phylop/index.html @@ -6,16 +6,16 @@ PhyloP | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

PhyloP

Overview

Publication

Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. Nature 2023. (https://doi.org/10.1038/s41586-023-06798-8)

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

PhyloP Primate

PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates. +

Version: 3.23

PhyloP

Overview

Publication

Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. Nature 2023. (https://doi.org/10.1038/s41586-023-06798-8)

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

PhyloP Primate

PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates. It enriches that with human genetic variants, these elements influence gene expression and impact complex traits and diseases.

PhyloP Primate is only available for GRCh38 assembly.

BigWig File

The original file is primates_msa.phylop.conacc.lrt.bw which is a bigwig file. This file was converted to wig file using: (https://genome.ucsc.edu/goldenPath/help/bigWig.html) After conversion the wig file provides the scores in the following format:

0.14
0.074
-2.487
0.073
0.052
0.073
fixedStep chrom=chr1 start=10558 step=1 span=1
-1.991
0.052
-2.047
0.052
0.052
0.074
-1.992
0.074
0.052
0.073
0.074
0.052
0.074
-2.05
-2.059
0.074
0.074
0.074

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951

PhyloP

PhyloP (phylogenetic p-values) conservation scores are obtained from the [PHAST package] (http://compgen.bscb.cornell.edu/phast/) for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes.

WigFix File

The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:

fixedStep chrom=chr1 start=10918 step=1
0.064
0.058
0.064
0.058
0.064
0.064
fixedStep chrom=chr1 start=34045 step=1
0.111
0.100
0.111
0.111
0.100
0.111
0.111
0.111
0.100
0.111
-1.636

We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs.

Download URL

GRCh37: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/

GRCh38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - + + \ No newline at end of file diff --git a/3.23/data-sources/phylopprimate-json/index.html b/3.23/data-sources/phylopprimate-json/index.html index de279a28..cf0a7eef 100644 --- a/3.23/data-sources/phylopprimate-json/index.html +++ b/3.23/data-sources/phylopprimate-json/index.html @@ -6,13 +6,13 @@ phylopprimate-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

phylopprimate-json

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951
- - +
Version: 3.23

phylopprimate-json

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951
+ + \ No newline at end of file diff --git a/3.23/data-sources/primate-ai-json/index.html b/3.23/data-sources/primate-ai-json/index.html index 6c81a3f2..d7250f51 100644 --- a/3.23/data-sources/primate-ai-json/index.html +++ b/3.23/data-sources/primate-ai-json/index.html @@ -6,13 +6,13 @@ primate-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

primate-ai-json

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
- - +
Version: 3.23

primate-ai-json

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
+ + \ No newline at end of file diff --git a/3.23/data-sources/primate-ai/index.html b/3.23/data-sources/primate-ai/index.html index 006f7e61..8b255ea2 100644 --- a/3.23/data-sources/primate-ai/index.html +++ b/3.23/data-sources/primate-ai/index.html @@ -6,19 +6,19 @@ Primate AI-3D | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Primate AI-3D

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. +

Version: 3.23

Primate AI-3D

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. The model's innovative use of primate sequencing and structural data offers promising insights into variant interpretation and disease gene identification. The predictive score range between 0 and 1, with 0 being benign and 1 being most pathogenic.

For more details, refer to these publications:

Publication
  1. Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023). https://doi.org/10.1126/science.abn8197
  2. Sundaram, L., Gao, H., Padigepati, S.R. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170 (2018). https://doi.org/10.1038/s41588-018-0167-z
Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parsing

TSV File

chr pos non_flipped_ref non_flipped_alt gene_name   change_position_1based  ref_aa  alt_aa  score_PAI3D percentile_PAI3D    refseq  prediction
chr1 69094 G A ENST00000335137.4 2 V M 0.6169436463713646 0.5200308441794135 NM_001005484.1 pathogenic
chr1 69094 G C ENST00000335137.4 2 V L 0.5557043975591658 0.4271457250214688 NM_001005484.1 benign
chr1 69094 G T ENST00000335137.4 2 V L 0.5557043975591658 0.4271457391722522 NM_001005484.1 benign
chr1 69095 T A ENST00000335137.4 2 V E 0.8063537482917307 0.8032228720356267 NM_001005484.1 pathogenic
chr1 69095 T C ENST00000335137.4 2 V A 0.5795628190040587 0.4631329075815453 NM_001005484.1 benign
chr1 69095 T G ENST00000335137.4 2 V G 0.7922330142557621 0.7834049546930125 NM_001005484.1 pathogenic

From the CSV file, all columns are parsed:

  • chr
  • pos
  • non_flipped_ref
  • non_flipped_alt
  • gene_name
  • change_position_1based
  • ref_aa
  • alt_aa
  • score_PAI3D
  • percentile_PAI3D
  • refseq
  • prediction

The fields gene_name and refseq define the Ensembl and RefSeq transcript IDs respectively. These transcripts are passed as-is and some of them might be unrecognized/deprecated by RefSeq/Ensembl.

GRCh37

Note that for GRCh37, a lifted over file is provided. The file is not sorted, therefore it must first be sorted. Also note that certain RefSeq transcripts appear not to have been mapped during the lift-over process.

Pre-processing

Sorting

gzcat PrimateAI-3D.hg19.txt.gz | sort -t $'\t'  -k1,1 -k2,2n | gzip > PrimateAI-3D.hg19_sorted.tsv.gz

SA Generation

dotnet SAUtils.dll \
PrimateAi \
--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \
--i "${ExternalDataSources}/PrimateAI/3D/PrimateAI-3D.hg38.txt.gz" \
--o "${SaUtilsOutput]"

Known Issues

Known Issues

Some transcript IDs defined in the data file are obsolete, retired, or updated. They are not removed or modified by Illumina Connected Annotations, and are passed as-is from the PrimateAI-3D data source.

Example:

ENST00000643905.1 transcript is retired according to Ensembl

NM_182838.2 transcript is removed because it is a pseudo-gene according to RefSeq

Download URL

https://primad.basespace.illumina.com/

JSON Output

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
- - + + \ No newline at end of file diff --git a/3.23/data-sources/revel-json/index.html b/3.23/data-sources/revel-json/index.html index e245281f..53b35a93 100644 --- a/3.23/data-sources/revel-json/index.html +++ b/3.23/data-sources/revel-json/index.html @@ -6,13 +6,13 @@ revel-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.23

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.23/data-sources/revel/index.html b/3.23/data-sources/revel/index.html index ba672cd3..095b7deb 100644 --- a/3.23/data-sources/revel/index.html +++ b/3.23/data-sources/revel/index.html @@ -6,13 +6,13 @@ REVEL | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.23

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/3.23/data-sources/splice-ai-json/index.html b/3.23/data-sources/splice-ai-json/index.html index 53d84fc7..759af5d3 100644 --- a/3.23/data-sources/splice-ai-json/index.html +++ b/3.23/data-sources/splice-ai-json/index.html @@ -6,13 +6,13 @@ splice-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.23

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/3.23/data-sources/splice-ai/index.html b/3.23/data-sources/splice-ai/index.html index 3b2a9d51..d6084509 100644 --- a/3.23/data-sources/splice-ai/index.html +++ b/3.23/data-sources/splice-ai/index.html @@ -6,13 +6,13 @@ Splice AI | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.23

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/3.23/data-sources/topmed-json/index.html b/3.23/data-sources/topmed-json/index.html index 49ea93cf..b09baaa6 100644 --- a/3.23/data-sources/topmed-json/index.html +++ b/3.23/data-sources/topmed-json/index.html @@ -6,13 +6,13 @@ topmed-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.23

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.23/data-sources/topmed/index.html b/3.23/data-sources/topmed/index.html index cab06170..696813c0 100644 --- a/3.23/data-sources/topmed/index.html +++ b/3.23/data-sources/topmed/index.html @@ -6,13 +6,13 @@ TOPMed | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.23

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/3.23/file-formats/custom-annotations/index.html b/3.23/file-formats/custom-annotations/index.html index 1515086f..55cd1333 100644 --- a/3.23/file-formats/custom-annotations/index.html +++ b/3.23/file-formats/custom-annotations/index.html @@ -6,12 +6,12 @@ Custom Annotations | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another +

Version: 3.23

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another common use case involves protected health information (PHI). Custom annotations are a mechanism that enables both use cases.

Here are some examples of how our collaborators use custom annotations:

  • associating context from both a sample-level and a sample cohort level with the variant annotations
  • adding content that is licensed (e.g. HGMD) to the variant annotations

At the moment, we have two different custom annotation file formats. One provides additional annotations to variants (both small variants and SVs) while the other caters to gene annotations.

In both cases, the custom annotation file format is a tab-delimited file that is separated into two parts: the header & the data.

The header is where you can customize how you want the data to appear in the JSON file and provide context about the genome assembly and how Illumina Connected Annotations should match the variants.

At Illumina, there are usually many components downstream of Illumina Connected Annotations that have to parse our annotations. If a customer provides a custom @@ -34,7 +34,7 @@ chromosome, svLength, cytogeneticBand, etc. The title should also not conflict with other data source keys like clingen or dgv.

caution

Care should be taken not to annotate using multiple custom annotations that all use the same title.

Genome Assemblies

The following genome assemblies can be specified:

  • GRCh37
  • GRCh38

Matching Criteria

The matching criteria instructs how Illumina Connected Annotations should match a VCF variant to the custom annotation.

The following matching criteria can be specified:

  • allele - use this when you only want allele-specific matches. This is commonly the case when using allele frequency data sources like gnomAD
  • position - use this when you want positional matches. This is commonly used with disease phenotype data sources like ClinVar
  • sv - use this when you want to match to all other overlapping SVs. This use case arose when we were adding custom annotations for baseline copy number intervals along the genome.

Categories

Categories are not used by Illumina Connected Annotations, but are often used by downstream tools. Categories provide hints for how those tools should filter or display the annotation data.

When a category is specified, Illumina Connected Annotations will provide additional validation for those fields. The following table describes each category:

CategoryDescriptionValidation
AlleleCountallele counts for a specific populationSee the supported populations below
AlleleNumberallele numbers for a specific populationSee the supported populations below
AlleleFrequencyallele frequencies for a specific populationSee the supported populations below
PredictionACMG-style pathogenicity classificationsbenign (B)
likely benign (LB)
VUS
likely pathogenic (LP)
pathogenic (P)
Filterfree text that signals downstream tools to add the column to the filterMax 20 characters
Descriptionfree-text descriptionMax 100 characters
Identifierany IDMax 50 characters
HomozygousCountcount of homozygous individuals for a specific populationSee the supported populations below
Scoreany score valueAny double-precision floating point number

Descriptions

Descriptions are used to add more context to the categories. For now, descriptions are mainly used to associate allele counts, numbers, and frequencies with their respective populations.

Populations

The following populations were specified in the HapMap project, 1000 Genomes Project, ExAC, and gnomAD.

Population CodeSuper-population CodeDescription
ACBAFRAfrican Caribbeans in Barbados
AFRAFRAfrican
ALLALLAll populations
AMRAMRAd Mixed American
ASJAshkenazi Jewish
ASWAFRAmericans of African Ancestry in SW USA
BEBSASBengali from Bangladesh
CDXEASChinese Dai in Xishuangbanna, China
CEUEURUtah Residents (CEPH) with Northern and Western European Ancestry
CHBEASHan Chinese in Beijing, China
CHSEASSouthern Han Chinese
CLMAMRColombians from Medellin, Colombia
EASEASEast Asian
ESNAFREsan in Nigeria
EUREUREuropean
FINEURFinnish in Finland
GBREURBritish in England and Scotland
GIHSASGujarati Indian from Houston, Texas
GWDAFRGambian in Western Divisions in the Gambia
IBSEURIberian population in Spain
ITUSASIndian Telugu from the UK
JPTEASJapanese in Tokyo, Japan
KHVEASKinh in Ho Chi Minh City, Vietnam
LWKAFRLuhya in Webuye, Kenya
MAGAFRMandinka in the Gambia
MKKAFRMaasai in Kinyawa, Kenya
MSLAFRMende in Sierra Leone
MXLAMRMexican Ancestry from Los Angeles, USA
NFEEUREuropean (Non-Finnish)
OTHOTHOther
PELAMRPeruvians from Lima, Peru
PJLSASPunjabi from Lahore, Pakistan
PURAMRPuerto Ricans from Puerto Rico
SASSASSouth Asian
STUSASSri Lankan Tamil from the UK
TSIEURToscani in Italia
YRIAFRYoruba in Ibadan, Nigeria

Data Types

Each custom annotation can be one of the following data types:

  • bool - true or false
  • number - any integer or floating-point number
  • string - text
tip

For boolean variables, only keys with a true value will be output to the JSON object.

Using SAUtils

Illumina Connected Annotations includes a tool called SAUtils that converts various data sources into Illumina Connected Annotations's native binary format. The sub-commands customvar and customgene are used to specify a variant file or a gene file respectively.

Convert Variant File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory

Convert Gene File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customgene \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-c Data/Cache \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -c argument specifies the Illumina Connected Annotations cache path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory
- - + + \ No newline at end of file diff --git a/3.23/file-formats/illumina-annotator-json-file-format/index.html b/3.23/file-formats/illumina-annotator-json-file-format/index.html index 777f50e6..c1e277ca 100644 --- a/3.23/file-formats/illumina-annotator-json-file-format/index.html +++ b/3.23/file-formats/illumina-annotator-json-file-format/index.html @@ -6,13 +6,13 @@ Illumina Connected Annotations JSON File Format | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"id": "4"
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
idstringallprovided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2-48010488-G-A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"impact": "moderate",
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"proteinId":"ENSP00000405294.1",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstring
cdsPosstring
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstring
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
impactstringSee Consequence Impact for details
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
proteinIdstringprotein ID. E.g. ENSP00000405294.1
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
- - +
Version: 3.23

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"id": "4"
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
idstringallprovided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2-48010488-G-A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"impact": "moderate",
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"proteinId":"ENSP00000405294.1",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstring
cdsPosstring
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstring
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
impactstringSee Consequence Impact for details
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
proteinIdstringprotein ID. E.g. ENSP00000405294.1
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + \ No newline at end of file diff --git a/3.23/index.html b/3.23/index.html index 48c7990c..b109699d 100644 --- a/3.23/index.html +++ b/3.23/index.html @@ -6,16 +6,16 @@ Introduction | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. +

Version: 3.23

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. The current officially supported versions are:

Data SourceVersionRelease Date
RefSeqGCF_000001405.40-RS_2023_032023-03-21
Ensembl1102023-04-27

In addition, it uses external data sources to provide additional context for each variant. Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. The basic tier can be accessed free of charge. The professional tier requires a license. Please see Licensed Content for details. For access, please contact annotation_support@illumina.com.

Data SourceAvailabilityLatest Supported Version
COSMICProfessional99
OMIMProfessional20240110
Primate AI-3DProfessional1.0
Splice AIProfessional1.3
1000 Genomes ProjectBasicPhase 3 v3plus
Cancer HotspotsBasic2017
ClinGenBasic20240110
ClinVarBasic20231230
DANNBasic20200205
dbSNPBasic156
DECIPHERBasic201509
FusionCatcherBasic1.33
GERPBasic20110522
GME VariomeBasic20160618
gnomADBasic3.1.2
MITOMAPBasic20200819
MultiZ 100 wayBasic20171006
REVELBasic20200205
TOPMedBasicfreeze 5

Download

Please visit Illumina Connected Annotations.

- - + + \ No newline at end of file diff --git a/3.23/introduction/dependencies/index.html b/3.23/introduction/dependencies/index.html index 960ce850..da13c0c6 100644 --- a/3.23/introduction/dependencies/index.html +++ b/3.23/introduction/dependencies/index.html @@ -6,13 +6,13 @@ Dependencies | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
- - +
Version: 3.23

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
+ + \ No newline at end of file diff --git a/3.23/introduction/getting-started/index.html b/3.23/introduction/getting-started/index.html index f3bb9c02..24cde260 100644 --- a/3.23/introduction/getting-started/index.html +++ b/3.23/introduction/getting-started/index.html @@ -6,13 +6,13 @@ Getting Started | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Preserving old data file

By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your SupplementaryAnnotation folder contained ClinVar_20231101.nsa and the latest available version is ClinVar_20231203.nsa, next time the Downloader is run, ClinVar_20231101.nsa will be replaced with ClinVar_20231203.nsa.

Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the SupplementaryAnnotation folder. In other words, the SupplementaryAnnotation folder cannot contain both ClinVar_20231101.nsa and ClinVar_20231203.nsa.

Here is an example of how to proceed if a user doesn't want the latest version of ClinVar.

ls Data/SupplementaryAnnotation/GRCh38
...
ClinGen_disease_validity_curations_20231011.nga
ClinVar_20230930.nsa
ClinVar_20230930.nsa.idx
...
mv Data/SupplementaryAnnotation/GRCh38/ClinVar* <tmp_dir>/GRCh38/

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh38 \
-o Data

rm Data/SupplementaryAnnotation/GRCh38/ClinVar*
mv <tmp_dir>/GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet Annotator.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--help, -h displays the help menu
--version, -v displays the version

Supplementary annotation version: 69, Reference version: 7

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

- - +
Version: 3.23

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Preserving old data file

By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your SupplementaryAnnotation folder contained ClinVar_20231101.nsa and the latest available version is ClinVar_20231203.nsa, next time the Downloader is run, ClinVar_20231101.nsa will be replaced with ClinVar_20231203.nsa.

Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the SupplementaryAnnotation folder. In other words, the SupplementaryAnnotation folder cannot contain both ClinVar_20231101.nsa and ClinVar_20231203.nsa.

Here is an example of how to proceed if a user doesn't want the latest version of ClinVar.

ls Data/SupplementaryAnnotation/GRCh38
...
ClinGen_disease_validity_curations_20231011.nga
ClinVar_20230930.nsa
ClinVar_20230930.nsa.idx
...
mv Data/SupplementaryAnnotation/GRCh38/ClinVar* <tmp_dir>/GRCh38/

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh38 \
-o Data

rm Data/SupplementaryAnnotation/GRCh38/ClinVar*
mv <tmp_dir>/GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet Annotator.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--help, -h displays the help menu
--version, -v displays the version

Supplementary annotation version: 69, Reference version: 7

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

+ + \ No newline at end of file diff --git a/3.23/introduction/licensedContent/index.html b/3.23/introduction/licensedContent/index.html index 91499d92..4fdb5e8b 100644 --- a/3.23/introduction/licensedContent/index.html +++ b/3.23/introduction/licensedContent/index.html @@ -6,17 +6,17 @@ Licensed Content | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Licensed Content

Illumina Conncted Annotations supports following content which is available through a license from Illumina. +

Version: 3.23

Licensed Content

Illumina Conncted Annotations supports following content which is available through a license from Illumina. The license file will allow users to download and annotate with these data sources.

  • COSMIC
  • OMIM
  • Primate AI-3D
  • Splice AI
note

License may be customized to allow access to one of more of the above at the time of license creation.

How to obtain the license?

Please contact annotation_support@illumina.com to obtain a special credentials file for the data sources of interest.

Visit Illumina Connected Annotations for more details.

How to use the credentials file?

After obtaining the credentials file, it may be used in two ways:

  1. Home folder
  2. Commandline argument

The default location of the license file is ~/.ilmnAnnotations/credentials.json. An example of credentials file as below:

{
"ApiKey":"myApiKey",
"ApiSecret": "abcdefghikjlmnopqrstuvwxyz-secretKey"
}

However, this can be overridden by the command line argument while downloading/annotating.

Download licensed content

dotnet Downloader.dll \
-o ~/data \
-ga GRCh38 \
--credentialsFile ~/credentials.json

Annotate with licensed content

dotnet Annotator.dll \
--ref ~/data/References/7/Homo_sapiens.GRCh38.Nirvana.dat \
--sd ~/data/SupplementaryDatabase \
-c ~/data/Cache/32 \
-i ~/input_vcf-hg38.vcf.gz \
-o ~/output \
--credentialsFile ~/credentials.json

Licensing Errors

If the license has expired, Illumina Connected Annotations will stop annotating and exit with an error code. These errors may be skipped by using the --ignoreLicenseError command line argument. After doing this, only basic data sources will be used for annotations. This can also be achieved by deleting the credentials file from the home folder.

- - + + \ No newline at end of file diff --git a/3.23/introduction/parsing-json/index.html b/3.23/introduction/parsing-json/index.html index 0ec08f46..8b7aa344 100644 --- a/3.23/introduction/parsing-json/index.html +++ b/3.23/introduction/parsing-json/index.html @@ -6,13 +6,13 @@ Parsing Illumina Connected Annotations JSON | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
- - +
Version: 3.23

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
+ + \ No newline at end of file diff --git a/3.23/utilities/jasix/index.html b/3.23/utilities/jasix/index.html index 2bc96da8..efa2e8b2 100644 --- a/3.23/utilities/jasix/index.html +++ b/3.23/utilities/jasix/index.html @@ -6,13 +6,13 @@ Jasix | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
- - +
Version: 3.23

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
+ + \ No newline at end of file diff --git a/3.23/utilities/sautils/index.html b/3.23/utilities/sautils/index.html index 18fd1d63..ae9027c1 100644 --- a/3.23/utilities/sautils/index.html +++ b/3.23/utilities/sautils/index.html @@ -6,17 +6,17 @@ SAUtils | IlluminaConnectedAnnotations - - + +
-
Version: 3.23

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema

SAUtils AutoDownloadGenerate

To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands. +

Version: 3.23

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema

SAUtils AutoDownloadGenerate

To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands. This subcommands basically integrate both download and generate subcommand. Currently, this subcommand support several data sources:

  • ClinVar
  • ClinGen
  • dbSNP
  • OMIM
  • COSMIC
dotnet SAUtils.dll AutoDownloadGenerate
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll autodownloadgenerate [options]
Downloads and generates the Supplementary Database for Omim, ClinGen, ClinVar, dbSNP, and COSMIC

OPTIONS:
--sources, -s <VALUE> comma separated list of external data sources
--inputJson, -j <path> input JSON path
--downloadBaseFolder, -b <directory>
base directory path external datasources
downloaded to
--downloadDate, -d <directory>
date directory name that external datasources
downloaded to. Default is today's date in yyyy-
MM-dd format (e.g. 2023-01-30).
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output SA directory
--actions, -a <VALUE> comma separated list of action(s) to perform.
action options: download, generate.
--help, -h displays the help menu
--version, -v displays the version

You can download only, generate only, or both download and generate supplementary files. To use this subcommands, you have to prepare a json file that will be used as data sources information. Below is tutorial to use this subcommand to generate each data source.

AutoDownloadGenerate ClinVar

Below is the command to use AutoDownloadGenerate for ClinVar to download and generate supplementary files.

dotnet SAUtils.dll AutoDownloadGenerate -s ClinVar -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinVar should be formatted like below:

{
"clinvar": {
"baseDirectory": "ClinVar",
"sourceFiles": [
{
"name": "ClinVar",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"files": [
{
"localFileName": "ClinVarFullRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz.md5"
},
{
"localFileName": "ClinVarVariationRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz.md5"
}
]
}
]
}
}

There is no need to modify the json entry for ClinVar and you can use as it is.

AutoDownloadGenerate ClinGen

dotnet SAUtils.dll AutoDownloadGenerate -s ClinGen -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinGen should be formatted like below:

{
"clingen": {
"baseDirectory": "ClinGen",
"sourceFiles": [
{
"name": "ClinGen Dosage Sensitivity Map",
"subDirectory": "DosageSensitivity",
"description": "Dosage sensitivity map from ClinGen (dbVar)",
"files": [
{
"localFileName": "ClinGen_gene_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_gene_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh38.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh38.tsv"
}
]
},
{
"name": "ClinGen disease validity curations",
"subDirectory": "GeneDiseaseValidity",
"description": "Disease validity curations from ClinGen (dbVar)",
"files": [
{
"localFileName": "Clingen-Gene-Disease-Summary.csv",
"downloadUrl": "https://search.clinicalgenome.org/kb/gene-validity/download"
}
]
}
]
}
}

There is no need to modify the json entry for ClinGen and you can use as it is.

AutoDownloadGenerate dbSNP

dotnet SAUtils.dll AutoDownloadGenerate -s dbSNP -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for dbSNP should be formatted like below:

{
"dbsnp": {
"baseDirectory": "dbSNP",
"sourceFiles": [
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh37",
"files": [
{
"localFileName": "GCF_000001405.25.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.25.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.md5"
}
]
},
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh38",
"files": [
{
"localFileName": "GCF_000001405.40.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.40.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.md5"
}
]
}
]
}
}

The json above is examplke for dbSNP version 156. If you want to use it for different version, adjust the version number and all entries in files to use the actual URL. If you only want to generate GRCh38, just remove the GRCh37 entries in the sourceFiles.

AutoDownloadGenerate OMIM

dotnet SAUtils.dll AutoDownloadGenerate -s OMIM -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for OMIM should be formatted like below:

{
"omim": {
"baseDirectory": "omim",
"sourceFiles": [
{
"name": "OMIM",
"description": "An Online Catalog of Human Genes and Genetic Disorders"
}
]
}
}

There is no need to modify the json entry for OMIM and you can use as it is. You have to export OMIM API key as environment variable with name OmimApiKey.

AutoDownloadGenerate COSMIC

dotnet SAUtils.dll AutoDownloadGenerate -s COSMIC -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for COSMIC should be formatted like below:

{
"Cosmic": {
"baseDirectory": "COSMIC",
"sourceFiles": [
{
"name": "COSMIC",
"version": "99",
"description": "the Catalogue Of Somatic Mutations In Cancer"
}
]
}
}

You have to adjust the version entry according to which COSMIC version you want. You also need to have COSMIC credentials and export it as environment variable with name Cosmic_Username and Cosmic_Password

- - + + \ No newline at end of file diff --git a/3.24/core-functionality/canonical-transcripts/index.html b/3.24/core-functionality/canonical-transcripts/index.html new file mode 100644 index 00000000..c30df368 --- /dev/null +++ b/3.24/core-functionality/canonical-transcripts/index.html @@ -0,0 +1,18 @@ + + + + + + + +Canonical Transcripts | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
+ + + + \ No newline at end of file diff --git a/3.24/core-functionality/gene-fusions/index.html b/3.24/core-functionality/gene-fusions/index.html new file mode 100644 index 00000000..b328389c --- /dev/null +++ b/3.24/core-functionality/gene-fusions/index.html @@ -0,0 +1,19 @@ + + + + + + + +Gene Fusion Detection | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 &amp; FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 &amp; FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. +If only unidirectional gene fusions are desired, only these two fusions can be detected. If enable-bidirectional-fusions is enabled, all four cases can be identified.

Interpreting translocation breakends

At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the VCF 4.2 specification.

REFALTMeaning
st[p[piece extending to the right of p is joined after t
st]p]reverse comp piece extending left of p is joined after t
s]p]tpiece extending to the left of p is joined before t
s[p[treverse comp piece extending right of p is joined before t

Variant Types

Specifically we can identify gene fusions from the following structural variant types:

  • deletions (<DEL>)
  • tandem_duplications (<DUP:TANDEM>)
  • inversions (<INV>)
  • translocation breakpoints (AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[)

Criteria

The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:

  1. After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if enable-bidirectional-fusions is not enabled. They can have the same or different orientations if enable-bidirectional-fusions is set.
  2. Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)
  3. Both transcripts must belong to different genes
  4. Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)

ETV6/RUNX1 Example

ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment.

VCF

Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND
chr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND
chr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND
chr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND

When you put these calls together, the resulting genomic rearrangement looks something like this:

JSON Output

The annotation for the first variant in the VCF looks like this:

{"positions":[
{
"chromosome": "12",
"position": 12026270,
"refAllele": "C",
"altAlleles": [
"[chr21:36420865[C"
],
"filters": [
"PASS"
],
"cytogeneticBand": "12p13.2",
"variants": [
{
"vid": "12-12026270-C-[chr21:36420865[C",
"chromosome": "12",
"begin": 12026270,
"end": 12026270,
"isStructuralVariant": true,
"refAllele": "C",
"altAllele": "[chr21:36420865[C",
"variantType": "translocation",
"transcripts": [
{
"transcript": "ENST00000396373.4",
"source": "Ensembl",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "ENSG00000139083",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "ENST00000437180.1",
"bioType": "mRNA",
"source": "Ensembl",
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000409227.1",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
},
{
"transcript": "ENST00000300305.3",
"bioType": "mRNA",
"source": "Ensembl",
"isCanonical": true,
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000300305.3",
"intron": 1,
"hgnc": "RUNX1",
"hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "ENSP00000379658.3"
},
{
"transcript": "NM_001987.5",
"source": "RefSeq",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "2120",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "NM_001754.5",
"bioType": "mRNA",
"source": "RefSeq",
"isCanonical": true,
"geneId": "861",
"proteinId": "NP_001745.2",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "NP_001978.1"
}
]
}
]
}
]}

FieldTypeNotes
transcriptstringtranscript ID
bioTypestringdescriptions of the biotypes from Ensembl
exonintexon that contained fusion breakpoint
intronintintron that contained fusion breakpoint
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
hgvsrstringHGVS RNA nomenclature

Gene Fusion Data Sources

To provide more context to our gene fusions, we provide the following gene fusion data sources:

Consequences

When a gene fusion is identified, we add the following Sequence Ontology consequence:

              "consequence": [
"transcript_variant",
"gene_fusion"
],
  • If both transcripts have the same orientation, we label it as unidirectional_gene_fusion, if they have different orientations, we label it as bidirectional_gene_fusion
  • If both unidirectional and bidirectional ones are detected, we label it as gene_fusion.

Gene Fusions Section

The geneFusions section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ENST00000396373.4, there 7 other Ensembl transcripts that would produce a gene fusion. For NM_001987.4, there was only one transcript (NM_001754.4) that produce a gene fusion.

For each originating transcript, we report the following for each partner transcript:

  • transcript ID
  • gene ID
  • HGNC gene symbol
  • transcript bio type (e.g. protein_coding)
  • intron or exon number containing the breakpoint
  • HGVS RNA notation
  • gene fusion directionality
tip

Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see HGVS SVD-WG007).

          "geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],

The HGVS RNA notation above indicates that the gene fusion starts with NM_001754.4 (RUNX1) until CDS position 58 and continues with NM_001987.4 (ETV6). 1009+3367 indicates that the fusion occurred 3367 bp within intron 2.

+ + + + \ No newline at end of file diff --git a/3.24/core-functionality/iscn-notation/index.html b/3.24/core-functionality/iscn-notation/index.html new file mode 100644 index 00000000..cab10747 --- /dev/null +++ b/3.24/core-functionality/iscn-notation/index.html @@ -0,0 +1,24 @@ + + + + + + + +ISCN Notation | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

ISCN Notation

Introduction

The International System for Human Cytogenetic Nomenclature (ISCN) is a standardized system used +to describe chromosomal abnormalities. It is a standardized system developed to describe the banding pattern of human +chromosomes as well as any structural variations. +ISCN is used by geneticists and researchers to ensure clarity and uniformity when reporting chromosomal abnormalities.

Key Components of ISCN Notation:

  • Chromosome Number: Identifies the chromosome.
  • Arm: Chromosome arms are labeled "p" (short arm) and "q" (long arm).
  • Banding Pattern: Each arm is divided into regions, bands, and sub-bands that are numbered starting from the centromere (central part of the chromosome).

Overview

The provided ISCN notation algorithm processes chromosomal variants and generates ISCN notation by following these steps:

  1. Identify Variant Type: +The algorithm recognizes several types of chromosomal variants such as duplications, deletions, copy number gains, and copy number losses.

  2. Locate Cytogenetic Bands: +Using the start and end positions of the variant, the algorithm identifies the corresponding cytogenetic bands on the chromosome.

  3. Generate Notation: +Constructs the ISCN notation string using the variant type, chromosome number, and identified cytogenetic bands.

Supported Variant Types

The algorithm supports the following variant types:

  • Deletion (del)
  • Duplication (dup)
  • Copy Number Gain (dup)
  • Copy Number Loss (del)

Example

For a deletion on chromosome 8 from position 19200001 to 135400001, the algorithm would:

  1. Recognize the variant type as a deletion.
  2. Identify the start band as p21.3 and the end band as q24.23.
  3. Generate the ISCN notation: del(8)(p21.3q24.23).

More examples:

ChromosomeStart PositionEnd PositionVariant TypeISCN Notation
8119200001deletiondel(8)(p21.3)
8119200001duplicationdup(8)(p21.3)
819200001135400001deletiondel(8)(p21.3q24.23)
819200001135400001duplicationdup(8)(p21.3q24.23)
8127300001131500000duplicationdup(8)(q24.21q24.22)
8127300001131500000copy number gaindup(8)(q24.21q24.22)
8128746677128749160duplicationdup(8)(q24.21q24.21)
8128746677128749160copy number gaindup(8)(q24.21q24.21)
8135400001138900001duplicationdup(8)(q24.23q24.3)
8135400001146364022deletiondel(8)(q24.23)
8135400001145138635duplicationdup(8)(q24.23q24.3)
8135400001138900001copy number lossdel(8)(q24.23q24.3)
8135400001146364022duplicationdup(8)(q24.23)
X86200001103700000copy number lossdel(X)(q21.31q22.2)
X86200001103700000deletiondel(X)(q21.31q22.2)

References

+ + + + \ No newline at end of file diff --git a/3.24/core-functionality/junction-preserving/index.html b/3.24/core-functionality/junction-preserving/index.html new file mode 100644 index 00000000..7e768d16 --- /dev/null +++ b/3.24/core-functionality/junction-preserving/index.html @@ -0,0 +1,18 @@ + + + + + + + +Junction Preserving Annotation | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Junction Preserving Annotation

Background

When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant.

Deletion at exon boundary

Note that:

  • When right-aligned the variant starts at the first base of the exon (as pictured).
  • When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.

From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an inframe_deletion not a splice_acceptor_variant as splice acceptor sequence AG is unaffected.

When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction.

Implementation

By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported.

note

This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant.

+ + + + \ No newline at end of file diff --git a/3.24/core-functionality/transcript-consequence-impacts/index.html b/3.24/core-functionality/transcript-consequence-impacts/index.html new file mode 100644 index 00000000..6057669d --- /dev/null +++ b/3.24/core-functionality/transcript-consequence-impacts/index.html @@ -0,0 +1,19 @@ + + + + + + + +Transcript Consequence Impact | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. +However, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided.

Example Transcript

The key impact for each transcript gives the impact rating for the consequence.

{
"variants": [
{
"vid": "1-1623412-T-C",
"chromosome": "1",
"begin": 1623412,
"end": 1623412,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.1623412T>C",
"transcripts": [
{
"transcript": "ENST00000479659.5",
"source": "Ensembl",
"bioType": "lncRNA",
"introns": "2/18",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"intron_variant",
"non_coding_transcript_variant"
],
"impact": "modifier",
"hgvsc": "ENST00000479659.5:n.288-19T>C"
},
{
"transcript": "ENST00000489635.5",
"source": "VEP",
"bioType": "mRNA",
"codons": "aTg/aCg",
"aminoAcids": "M/T",
"cdnaPos": "269",
"cdsPos": "134",
"exons": "3/20",
"proteinPos": "45",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"missense_variant"
],
"impact": "moderate",
"hgvsc": "ENST00000489635.5:c.134T>C",
"hgvsp": "ENSP00000426007.1:p.(Met45Thr)",
"proteinId": "ENSP00000426007.1"
}
]
}
]
}
+ + + + \ No newline at end of file diff --git a/3.24/core-functionality/variant-ids/index.html b/3.24/core-functionality/variant-ids/index.html new file mode 100644 index 00000000..7e422680 --- /dev/null +++ b/3.24/core-functionality/variant-ids/index.html @@ -0,0 +1,18 @@ + + + + + + + +Variant IDs | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/1000Genomes-snv-json/index.html b/3.24/data-sources/1000Genomes-snv-json/index.html new file mode 100644 index 00000000..ea4c3690 --- /dev/null +++ b/3.24/data-sources/1000Genomes-snv-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +1000Genomes-snv-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/1000Genomes-sv-json/index.html b/3.24/data-sources/1000Genomes-sv-json/index.html new file mode 100644 index 00000000..04cb0a46 --- /dev/null +++ b/3.24/data-sources/1000Genomes-sv-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +1000Genomes-sv-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/1000Genomes/index.html b/3.24/data-sources/1000Genomes/index.html new file mode 100644 index 00000000..76f46ea0 --- /dev/null +++ b/3.24/data-sources/1000Genomes/index.html @@ -0,0 +1,20 @@ + + + + + + + +1000 Genomes | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 +GRCh38

JSON Output

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

Structural Variants

VCF File Parsing

The VCF files contain entries like the following:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A <CN0>,<CN2>,<CN3>,<CN4> 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4

Please note that, CNVs are allele-specific. For example, HG00096 is effectively copy number 4, which would be a net gain on chr22.

1000 Genomes contains 5 types of structural variants:

  • CNV
  • DEL
  • DUP
  • INS
  • INV

Since data of 1000 genomes is provided in VCF format, we assume that the coordinates follow the vcf format, i.e., there is a padding base for symbolic alleles. So all the interval can be interpreted as [BEGIN+1, END]. +Similarly, for all other variant types except insertion, END is far larger than BEGIN. The distribution of BEGIN and END for insertions is summarized below.

Insertion issues

  • END = BEGIN for 6/165
  • END = BEGIN+2 for 93/165
  • END = BEGIN+3 for 11/165
  • END = BEGIN+4 for 11/165
  • END – BEGIN range from 5 to 1156 for others.

Converting VCF svTypes to SO sequence alterations

The svType will be captured in our JSON file under the sequenceAlteration key. Here's the translation we'll use according to svType in 1000 Genomes.

svTypeAlternative Alleles contain <CN*>sequenceAlteration
ALUFALSEmobile_element_insertion
DUPTRUEcopy_number_gain
CNVTRUEcopy_number_gain (observed_gains >0 and observed_losses =0)
copy_number_loss (observed_gains = 0 and observed_losses > 0)
copy_number_variation (otherwise)
DELTRUEcopy_number_loss
LINE1FALSEmobile_element_insertion
SVAFALSEmobile_element_insertion
INVFALSEinversion
INSFALSEinsertion

Exceptions

We discard structural variants without END

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
21 9495848 esv3646347 A <INS:ME:LINE1> 100 PASS AC=1543;AF=0.308107;AN=5008;CS=L1_umary;MEINFO=LINE1,5669,6005,+;NS=2504;SVLEN=336;SVTYPE=LINE1;TSD=null;DP=20015;EAS_AF=0.3125;AMR_AF=0.2911;AFR_AF=0.3026;EUR_AF=0.2922;SAS_AF=0.3395;VT=SV GT 0|0 1|1 1|0 0|1 1|0 1|0 0|0

CNVs in chrY

  • No other types of structural variants exist in chrY
  • Since copy number is provided in genotype field, we directly parse the copy number from "CN" field.
  • For most CNVs in chrY, the reference copy number is 1, but the refence number for CNVs in segmental duplication sites is 2 (<CN2> in the 2nd example). All segmental duplication calls have identifiers starting with GS_SD_M2.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00101 HG00103 HG00105 HG00107 HG00108
Y 2888555 CNV_Y_2888555_3014661 T <CN2> 100 PASS AC=1;AF=0.000817661;AN=1223;END=3014661;NS=1233;SVTYPE=CNV;AMR_AF=0.0000;AFR_AF=0.0000;EUR_AF=0.0000;SAS_AF=0.0019;EAS_AF=0.0000;VT=SV GT:CN:CNL:CNP:CNQ:GP:GQ:PL 0:1:-1000,0,-58.45:-1000,0,-61.55:99:0,-61.55:99:0,585 0:1:-296.36,0,-16.6:-300.46,0,-19.7:99:0,-19.7:99:0,166 0:1:-1000,0,-39.44:-1000,0,-42.54:99:0,-42.54:99:0,394
Y 6128381 GS_SD_M2_Y_6128381_6230094_Y_9650284_9752225 C <CN1>,<CN3> 100 PASS AC=4,2;AF=0.00327065,0.00163532;AN=1223;END=6230094;NS=1233;SVTYPE=CNV;AMR_AF=0.0029,0.0029;AFR_AF=0.0016,0.0016;EUR_AF=0.0000,0.0000;SAS_AF=0.0038,0.0000;EAS_AF=0.0000,0.0000;VT=SV;EX_TARGET GT:CN:CNL:CNP:CNQ:GP:GQ 0:2:-1000,-138.78,0,-38.53:-1000,-141.27,0,-41.33:99:0,-141.27,-41.33:99 0:2:-1000,-53.32,0,-17.85:-1000,-55.81,0,-20.64:99:0,-55.81,-20.64:99 0:2:-1000,-71.83,0,-32.5:-1000,-74.32,0,-35.29:99:0,-74.32,-35.29:99 0:2:-1000,-60.96,0,-20.29:-1000,-63.45,0,-23.08:99:0,-63.45,-23.08:99 0:2:-1000,-77.6,0,-31.45:-1000,-80.09,0,-34.24:99:0,-80.09,-34.24:99

JSON Output

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/amino-acid-conservation-json/index.html b/3.24/data-sources/amino-acid-conservation-json/index.html new file mode 100644 index 00000000..1ecd4180 --- /dev/null +++ b/3.24/data-sources/amino-acid-conservation-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +amino-acid-conservation-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/amino-acid-conservation/index.html b/3.24/data-sources/amino-acid-conservation/index.html new file mode 100644 index 00000000..051333c5 --- /dev/null +++ b/3.24/data-sources/amino-acid-conservation/index.html @@ -0,0 +1,19 @@ + + + + + + + +Amino Acid Conservation | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. +For position 6, we would say that we have 43% conservation (3/7) since three organisms share the same residue as humans.

Assigning scores to Illumina Connected Annotations transcripts

The source FASTA file comes with Ensembl/UCSC transcript ids of the transcripts used for alignments. The Illumina Connected Annotations cache has RefSeq and Ensembl transcripts and our first attempt was to map the given Ensembl/UCSC ids to their equivalent RefSeq/Ensembl ids. This attempt was unsuccessful since UCSC Table Browser provided mapping without version numbers. So we proceeded as follows:

  • Take proteins which have a unique mapping (and hence one set of conservation scores). For ones that mapped to both ChrX and ChrY, we accepted the one from ChrX.
  • A Illumina Connected Annotations transcript having an exact peptide sequence match with a uniquely aligned protein is assigned the corresponding conservation scores.

Unfortunately this left us with a very small number of transcripts having conservation scores.

GRCh37

  • Source FASTA contained 41957 protein alignments.
  • 38165 proteins had unique scores.
  • 88 aligned proteins existed in Illumina Connected Annotations cache.
  • 118 transcripts had conservation scores.

GRCh38

  • Source FASTA contained 110024 protein alignments.
  • 88961 proteins had unique scores.
  • 11688 aligned proteins existed in Illumina Connected Annotations cache.
  • 12098 transcripts had conservation scores.

Download URL

GRCh37: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz

GRCh38: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz

JSON Output

Conservation scores are reported in the transcript section. One score is reported for each alt allele

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/cancer-hotspots/index.html b/3.24/data-sources/cancer-hotspots/index.html new file mode 100644 index 00000000..c07c626f --- /dev/null +++ b/3.24/data-sources/cancer-hotspots/index.html @@ -0,0 +1,19 @@ + + + + + + + +Cancer Hotspots | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. +Then we match each entry using these notations.

caution

We currently skip all variants labeled as splice from the source

JSON Output

The data source will be captured under the cancerHotspots key in the transcript section.

{
"transcript":"NM_002524.5",
"source":"RefSeq",
"bioType":"mRNA",
"aminoAcids":"Q/K",
"proteinPos":"61",
"geneId":"4893",
"hgnc":"NRAS",
"hgvsc":"NM_002524.5:c.181C>A",
"hgvsp":"NP_002515.1:p.(Gln61Lys)",
"isCanonical":true,
"proteinId":"NP_002515.1",
"cancerHotspots":{
"residue":"Q61",
"numSamples":422,
"numAltAminoAcidSamples":142,
"qValue":0
}
}
FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clingen-dosage-json/index.html b/3.24/data-sources/clingen-dosage-json/index.html new file mode 100644 index 00000000..bf2f85c2 --- /dev/null +++ b/3.24/data-sources/clingen-dosage-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +clingen-dosage-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clingen-gene-validity-json/index.html b/3.24/data-sources/clingen-gene-validity-json/index.html new file mode 100644 index 00000000..b92e9d85 --- /dev/null +++ b/3.24/data-sources/clingen-gene-validity-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +clingen-gene-validity-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clingen-json/index.html b/3.24/data-sources/clingen-json/index.html new file mode 100644 index 00000000..a83db017 --- /dev/null +++ b/3.24/data-sources/clingen-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +clingen-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clingen/index.html b/3.24/data-sources/clingen/index.html new file mode 100644 index 00000000..b10debed --- /dev/null +++ b/3.24/data-sources/clingen/index.html @@ -0,0 +1,18 @@ + + + + + + + +ClinGen | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clinvar-json/index.html b/3.24/data-sources/clinvar-json/index.html new file mode 100644 index 00000000..edf01acd --- /dev/null +++ b/3.24/data-sources/clinvar-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +clinvar-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clinvar-preview-json/index.html b/3.24/data-sources/clinvar-preview-json/index.html new file mode 100644 index 00000000..b1810a87 --- /dev/null +++ b/3.24/data-sources/clinvar-preview-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +clinvar-preview-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

clinvar-preview-json

small variants:

{
"clinvar-preview": [
{
"altAllele": "A",
"refAllele": "G",
"variantType": "SNV",
"accession": "VCV000437934",
"version": "1",
"recordType": "classified",
"dateLastUpdated": "2023-08-06",
"rcvs": [
{
"accession": "RCV000505090",
"version": "1",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2016-08-31",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "Cleidocranial dysostosis",
"db": "MedGen",
"id": "C0008928"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2016-08-31",
"mostRecentSubmission": "2017-09-09",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "820",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Cleidocranial+Dysplasia/1683"
},
{
"db": "SNOMED CT",
"id": "65976001"
}
],
"value": "Cleidocranial dysostosis"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000598565"
}
]
}
]
}

large variants:

{
"clinvar-preview": [
{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
"variantType": "copy_number_gain",
"accession": "VCV000154089",
"version": "2",
"recordType": "classified",
"dateLastUpdated": "2023-10-15",
"rcvs": [
{
"accession": "RCV000142236",
"version": "6",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2014-03-10",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "See cases"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2014-03-10",
"mostRecentSubmission": "2015-07-13",
"conditions": [
{
"type": "PhenotypeInstruction",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "18728",
"name": {
"value": "See cases"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000183512"
}
]
}
]
}
FieldTypeNotes
chromosomestringChromosome
beginintegerstart position of variant
endintegerend of position of variant
refAllelestring
altAllelestring
accessionstringClinVar ID
versionstringClinVar version
variantTypestringvariant type
recordTypestringrecord type
dateLastUpdatedstringyyyy-MM-dd
rcvsarrayRCV objects associated to this VCV
classificationsarrayclassifications for this VCV
clinicalAssertionsarraySCV objects associated to this VCV
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

Variant Types

  • copy_number_gain
  • copy_number_loss
  • deletion
  • delins
  • duplication
  • insertion
  • inversion
  • SNV
  • tandem_duplication

Review Statuses

  • criteria provided, conflicting classifications
  • criteria provided, multiple submitters, no conflicts
  • criteria provided, single submitter
  • no assertion criteria provided
  • no classification provided
  • practice guideline
  • reviewed by expert panel

classification

  • Benign
  • Likely benign
  • Pathogenic
  • Uncertain significance
  • Likely pathogenic
  • Benign/Likely benign
  • not provided
  • conflicting data from submitters
  • Pathogenic/Likely pathogenic
  • association
  • Conflicting classifications of pathogenicity
  • Pathogenic; risk factor
  • risk factor
  • other
  • drug response
  • Uncertain significance; Pathogenic/Likely pathogenic
  • Likely pathogenic, low penetrance
  • Pathogenic; Affects
  • Pathogenic, low penetrance
  • protective
  • Affects
  • Benign; other
  • Conflicting classifications of pathogenicity; other
  • Conflicting classifications of pathogenicity; association
  • Uncertain risk allele
  • Uncertain significance; risk factor
  • Likely pathogenic; risk factor
  • Likely benign; association
  • Likely risk allele
  • Pathogenic/Likely pathogenic; other
  • Pathogenic; other
  • Pathogenic/Likely pathogenic/Pathogenic, low penetrance
  • Pathogenic/Likely pathogenic; risk factor
  • Benign/Likely benign; risk factor
  • Uncertain significance/Uncertain risk allele
  • Pathogenic; association; protective
  • protective; risk factor
  • Benign/Likely benign; other; risk factor
  • Benign/Likely benign; association
  • Benign; association
  • Affects; association; other
  • Pathogenic; protective
  • Conflicting classifications of pathogenicity; drug response; other
  • Conflicting classifications of pathogenicity; drug response
  • Benign; drug response
  • Likely pathogenic; other
  • Conflicting classifications of pathogenicity; protective
  • Pathogenic/Likely pathogenic; drug response
  • Benign/Likely benign; other
  • Likely pathogenic/Likely risk allele
  • Uncertain risk allele; protective
  • association not found
  • Affects; association
  • Uncertain significance; association
  • Likely benign; other
  • Uncertain significance; other
  • Conflicting classifications of pathogenicity; association; risk factor Pathogenic;
  • association
  • Benign; risk factor
  • Conflicting classifications of pathogenicity; other; risk factor
  • Pathogenic/Likely risk allele; risk factor
  • Uncertain significance; drug response
  • Conflicting classifications of pathogenicity; risk factor
  • other; risk factor
  • Pathogenic/Likely pathogenic/Likely risk allele
  • Likely pathogenic; drug response
  • Conflicting classifications of pathogenicity; Affects
  • association; drug response; risk factor
  • Pathogenic; drug response
  • Affects; risk factor
  • Pathogenic; drug response; other
  • Likely pathogenic; protective
  • confers sensitivity
  • Likely pathogenic; association
  • Benign; Affects
  • Likely pathogenic; Affects
  • Uncertain risk allele; risk factor
  • drug response; risk factor
  • Pathogenic/Likely risk allele
  • Likely benign; drug response; other
  • Benign/Likely benign; drug response
  • Benign/Likely benign; drug response; other
  • drug response; other
  • association; drug response
  • Pathogenic; confers sensitivity
  • association; risk factor
  • Pathogenic/Pathogenic, low penetrance; other
  • Benign; confers sensitivity
  • confers sensitivity; other
  • Likely pathogenic/Pathogenic, low penetrance
  • Likely benign; risk factor
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clinvar-preview/index.html b/3.24/data-sources/clinvar-preview/index.html new file mode 100644 index 00000000..0bda6d96 --- /dev/null +++ b/3.24/data-sources/clinvar-preview/index.html @@ -0,0 +1,23 @@ + + + + + + + +ClinVar Preview | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

ClinVar Preview

Overview

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

ClinVar Preview relates to the new ClinVar XML format introduced in 2024. +Following sections describe the parsing and subsequent json format provided by Illumina Connected Annotations.

Parsing

ClinVar recommends using the VCV XML file because it contains comprehensive information.

Parsing is simplified by using the XSD file generation. +Command for generating XSD file

xsd ClinVar_VCV.xsd /n:VariationArchive /c

Overall XML to JSON mapping

keytypedescriptionXML path
variantTypestringsequence ontologyVariationArchive.VariationType
accessionstringVCV Id from ClinVarVariationArchive.Accession
versionstringVCV Id versionVariationArchive.Version
recordTypestringclassifiedVariationArchive.RecordType
dateLastUpdateddate timedate VCV was last updatedVariationArchive.DateLastUpdated
chromosomestringchromosome (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.Chr
beginnumberstart position of the variant (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.positionVCF
endnumberend position of the variant (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.displayStop or calculated
refAllelestringreference alleles (small variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.referenceAlleleVCF
altAllelestringalternate alleles (small variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.alternateAlleleVCF
rcvslistlist of RCV objectsVariationArchive.ClassifiedRecord.RCVList
classificationslistlist of classification objectsVariationArchive.ClassifiedRecord.Classifications
clinicalAssertionslistlist of clinicalAssertion objectsVariationArchive.ClassifiedRecord.ClinicalAssertionList

Variation fields

XML

<VariationArchive
VariationID="1381081"
VariationName="NM_003000.3(SDHB):c.19_41dup (p.Pro14_Ala15insSerProTer)"
VariationType="Indel"
Accession="VCV001381081"
Version="3"
RecordType="classified"
DateLastUpdated="2024-01-26"
NumberOfSubmissions="1"
NumberOfSubmitters="1"
DateCreated="2022-03-28"
MostRecentSubmission="2023-02-07"
>
...

JSON

{
"variantType": "delins",
"accession": "VCV001381081",
"version": "3",
"recordType": "classified",
"dateLastUpdated": "2024-01-26",
...
}

Location fields

<SimpleAllele
AlleleID="196495"
VariationID="1381081"
>
<Location>
<CytogeneticLocation>1p36.13</CytogeneticLocation>
<SequenceLocation
Accession="NC_000001.11"
Chr="1"
Assembly="GRCh38"
positionVCF="17053978"
referenceAlleleVCF="C"
alternateAlleleVCF="CGGCAACCGGCGCCTCAAGGAGAG"
display_start="17053978"
display_stop="17053979"
AssemblyAccessionVersion="GCF_000001405.38"
forDisplay="true"
AssemblyStatus="current"
start="17053978"
stop="17053979"
variantLength="23"
/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="17380473"
stop="17380474" display_start="17380473" display_stop="17380474"
variantLength="23" positionVCF="17380473" referenceAlleleVCF="C"
alternateAlleleVCF="CGGCAACCGGCGCCTCAAGGAGAG"/>
</Location>
...
</SimpleAllele>

JSON Small Variant

note the alleles are trimmed

{
"altAllele": "GGCAACCGGCGCCTCAAGGAGAG",
"refAllele": "-",
...
}

JSON Large Variant

{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
...
}

RCVs

RCV Object from XML path VariationArchive.ClassifiedRecord.RCVList

keytypedescriptionXML sub-path
accessionstringVCV Id from ClinVarRCVList.RCVAccession.Accession
versionstringVCV Id versionRCVList.RCVAccession.Accession
classificationslistlist of classification objectsRCVList.RCVAccession.RCVClassifications
classifiedConditionslistlist of classified conditionsRCVList.RCVAccession.ClassifiedConditionList

XML

<RCVList>
<RCVAccession
Title="NM_003000.3(SDHB):c.19_41dup (p.Pro14_Ala15insSerProTer) AND multiple conditions"
Accession="RCV001921860"
Version="3">
<ClassifiedConditionList TraitSetID="23696">
...
</ClassifiedConditionList>
<RCVClassifications>
...
</RCVClassifications>
</RCVAccession>
</RCVList>
...

JSON

{
"rcvs": [
{
"accession": "RCV001921860",
"version": "3",
"classifications": {
...
},
"classifiedConditions": [
...
]
}
]
}

Classifications

Classification object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications +classification can be of following types:

  1. germlineClassification
  2. somaticClinicalImpact
  3. oncogenicityClassification
Germline Classification

Classification object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications.GermlineClassification

keytypedescriptionXML sub-path
reviewStatusstringreview statusGermlineClassification.ReviewStatus
descriptionslistlist of classification objectsGermlineClassification.Description
descriptions[].classificationstringclassificationGermlineClassification.Description.Value
descriptions[].dateLastEvaluateddatedate last evaluatedGermlineClassification.Description.DateLastEvaluated

XML

<RCVClassifications>
<GermlineClassification>
<ReviewStatus>criteria provided, single submitter</ReviewStatus>
<Description DateLastEvaluated="2021-08-04" SubmissionCount="1">Pathogenic</Description>
</GermlineClassification>
</RCVClassifications>

JSON

{
"classifications": {
"germlineClassification": {
"reviewStatus": "criteria provided, single submitter",
"descriptions": [
{
"dateLastEvaluated": "2021-08-04",
"classification": "Pathogenic"
}
]
}
}
}
Classified Conditions

Classified conditions object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.ClassifiedConditionList

keytypedescriptionXML sub-path
conditionstringVCV Id from ClinVarClassifiedConditionList.ClassifiedCondition.Value
dbstringlist of classification objectsClassifiedConditionList.ClassifiedCondition.DB
idstringclassificationClassifiedConditionList.ClassifiedCondition.ID

XML

<ClassifiedConditionList TraitSetID="23696">
<ClassifiedCondition DB="MedGen" ID="C0238198">Gastrointestinal stromal tumor</ClassifiedCondition>
<ClassifiedCondition DB="MedGen" ID="C1861848">Paragangliomas 4</ClassifiedCondition>
<ClassifiedCondition DB="MedGen" ID="C0031511">Pheochromocytoma</ClassifiedCondition>
</ClassifiedConditionList>

JSON

{
"classifiedConditions": [
{
"condition": "Gastrointestinal stromal tumor",
"db": "MedGen",
"id": "C0238198"
},
{
"condition": "Paragangliomas 4",
"db": "MedGen",
"id": "C1861848"
},
{
"condition": "Pheochromocytoma",
"db": "MedGen",
"id": "C0031511"
}
]
}

Classifications

Classification object from XML path VariationArchive.ClassifiedRecord.Classifications +classification can be of following types:

  1. germlineClassification
  2. somaticClinicalImpact
  3. oncogenicityClassification

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
...
</GermlineClassification>
</Classifications>

JSON

"classifications": {
"germlineClassification": {...}
}
Germline Classification

Classification object from XML path VariationArchive.ClassifiedRecord.Classifications.GermlineClassification

keytypedescriptionXML sub-path
classificationstringclassificationGermlineClassification.Description
reviewStatusstringreview statusGermlineClassification.ReviewStatus
dateLastEvaluateddatedate last evaluatedGermlineClassification.DateLastEvaluated
mostRecentSubmissiondatedate last evaluatedGermlineClassification.MostRecentSubmission
pubMedIdslistlist of PubMedIdsGermlineClassification.Citation.ID.Value
conditionslistlist of conditionsGermlineClassification.ConditionList

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
<ReviewStatus>criteria provided, single submitter</ReviewStatus>
<Description>Pathogenic</Description>
<Citation Type="general">
<ID Source="PubMed">19454582</ID>
</Citation>
<Citation Type="general">
<ID Source="PubMed">19802898</ID>
</Citation>
<ConditionList>
...
</ConditionList>
</GermlineClassification>
</Classifications>

JSON

{
"classifications": {
"germlineClassification": {
"classification": "Pathogenic",
"reviewStatus": "criteria provided, single submitter",
"dateLastEvaluated": "2021-08-04",
"mostRecentSubmission": "2023-02-07",
"conditions": [...],
"pubMedIds": [
"19454582",
"19802898"
]
}
}
}
Conditions

Conditions object from XML path VariationArchive.ClassifiedRecord.Classifications.GermlineClassification.ConditionList

keytypedescriptionXML sub-path
typestringclassificationConditionList.TraitSet.Type
contributesToAggregateClassificationTrue or blankcontributes to aggregate classifcationConditionList.TraitSet.ContributesToAggregateClassification
traitslisttrait objectsConditionList.TraitSet.Trait
traits[].iddatedate last evaluatedConditionList.TraitSet.Trait
traits[].nameobjecttrait name objectConditionList.TraitSet.Trait
traits[].name.valuestringpreferred trait nameConditionList.TraitSet.Trait.Name.ElementValue.Type
traits[].name.xRefslistlist of cross referencesConditionList.TraitSet.Trait.Name.XRef
traits[].name.xRefs[].dbstringpreferred name cross reference databaseConditionList.TraitSet.Trait.Name.XRef.DB
traits[].name.xRefs[].idstringpreferred name cross reference identifierConditionList.TraitSet.Trait.Name.XRef.ID

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
<ConditionList>
<TraitSet ID="23696" Type="Disease" ContributesToAggregateClassification="true">
<Trait ID="3796" Type="Disease">
<Name>
<ElementValue Type="Preferred">Pheochromocytoma</ElementValue>
<XRef ID="Pheochromocytoma/5718" DB="Genetic Alliance"/>
<XRef ID="HP:0002666" DB="Human Phenotype Ontology"/>
<XRef ID="MONDO:0008233" DB="MONDO"/>
</Name>
<Name>
<ElementValue Type="Alternate">Chromaffinoma</ElementValue>
</Name>
...
</Trait>
</TraitSet>
</ConditionList>
</GermlineClassification>
</Classifications>

JSON

{
"classifications": {
"germlineClassification": {
"classification": "Pathogenic",
"reviewStatus": "criteria provided, single submitter",
"dateLastEvaluated": "2021-08-04",
"mostRecentSubmission": "2023-02-07",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "3796",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Pheochromocytoma/5718"
},
{
"db": "Human Phenotype Ontology",
"id": "HP:0002666"
},
{
"db": "MONDO",
"id": "MONDO:0008233"
}
],
"value": "Pheochromocytoma"
}
}
]
}
],
"pubMedIds": [
"19454582",
"19802898"
]
}
}
}

Clinical Assertions

Conditions object from XML path VariationArchive.ClassifiedRecord.ClinicalAssertionList

keytypedescriptionXML sub-path
accessionstringSCV Id from ClinVarClinicalAssertionList.ClinVarAccession.Accession
pubMedIdslistlist of PubMedIdsClinicalAssertionList.ClinicalAssertion.AttributeSet.Citation.ID.Value

XML

<ClinicalAssertionList>
<ClinicalAssertion ID="4172562" SubmissionDate="2023-01-11" DateLastUpdated="2023-02-07"
DateCreated="2022-03-28">
<ClinVarSubmissionID localKey="12475853|MedGen:C0238198;C1861848;C0031511"
submittedAssembly="GRCh37"/>
<ClinVarAccession
Accession="SCV002152762"
DateUpdated="2023-02-07"
DateCreated="2022-03-28"
Type="SCV"
Version="2"
SubmitterName="Invitae"
OrgID="500031"
OrganizationCategory="laboratory"
OrgAbbreviation="Invitae"
/>
<RecordStatus>current</RecordStatus>
<Classification DateLastEvaluated="2021-08-04">
...
</Classification>
<Assertion>variation to disease</Assertion>
<AttributeSet>
<Attribute Type="AssertionMethod">Invitae Variant Classification Sherloc (09022015)</Attribute>
<Citation>
<ID Source="PubMed">28492532</ID>
</Citation>
</AttributeSet>
<ObservedInList>
...
</ObservedInList>
<SimpleAllele>
...
</SimpleAllele>
<TraitSet Type="Disease">
...
</TraitSet>
<SubmissionNameList>
...
</SubmissionNameList>
</ClinicalAssertion>
</ClinicalAssertionList>

JSON

{
"clinicalAssertions": [
{
"accession": "SCV002152762",
"pubMedIds": [
"28492532"
]
}
]
}

Known Issues

Known Issues

Entries with following missing/incorrect information are skipped

  1. Invalid Ref Allele (example VCV000437934)
  2. Invalid Alt Allele (example VCV000006637)
  3. Following variant types are not supported:
    1. Variation (example VCV000001101)
    2. fusion (example VCV000015269)
    3. unknown (example VCV000017564)
    4. protein only (example VCV000132152)
    5. Complex (example VCV000221337)
    6. Translocation (example VCV000267801)
    7. no_sequence_alteration (example VCV000010504)
  4. Only records of type classified are included [VCV with type included is skipped (example VCV000431749)]
  5. Records with missing genomic location are skipped (example VCV000000254)

Download URLs

https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarVCVRelease_00-latest.xml.gz

JSON Output

small variants:

{
"clinvar-preview": [
{
"altAllele": "A",
"refAllele": "G",
"variantType": "SNV",
"accession": "VCV000437934",
"version": "1",
"recordType": "classified",
"dateLastUpdated": "2023-08-06",
"rcvs": [
{
"accession": "RCV000505090",
"version": "1",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2016-08-31",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "Cleidocranial dysostosis",
"db": "MedGen",
"id": "C0008928"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2016-08-31",
"mostRecentSubmission": "2017-09-09",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "820",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Cleidocranial+Dysplasia/1683"
},
{
"db": "SNOMED CT",
"id": "65976001"
}
],
"value": "Cleidocranial dysostosis"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000598565"
}
]
}
]
}

large variants:

{
"clinvar-preview": [
{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
"variantType": "copy_number_gain",
"accession": "VCV000154089",
"version": "2",
"recordType": "classified",
"dateLastUpdated": "2023-10-15",
"rcvs": [
{
"accession": "RCV000142236",
"version": "6",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2014-03-10",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "See cases"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2014-03-10",
"mostRecentSubmission": "2015-07-13",
"conditions": [
{
"type": "PhenotypeInstruction",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "18728",
"name": {
"value": "See cases"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000183512"
}
]
}
]
}
FieldTypeNotes
chromosomestringChromosome
beginintegerstart position of variant
endintegerend of position of variant
refAllelestring
altAllelestring
accessionstringClinVar ID
versionstringClinVar version
variantTypestringvariant type
recordTypestringrecord type
dateLastUpdatedstringyyyy-MM-dd
rcvsarrayRCV objects associated to this VCV
classificationsarrayclassifications for this VCV
clinicalAssertionsarraySCV objects associated to this VCV
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

Variant Types

  • copy_number_gain
  • copy_number_loss
  • deletion
  • delins
  • duplication
  • insertion
  • inversion
  • SNV
  • tandem_duplication

Review Statuses

  • criteria provided, conflicting classifications
  • criteria provided, multiple submitters, no conflicts
  • criteria provided, single submitter
  • no assertion criteria provided
  • no classification provided
  • practice guideline
  • reviewed by expert panel

classification

  • Benign
  • Likely benign
  • Pathogenic
  • Uncertain significance
  • Likely pathogenic
  • Benign/Likely benign
  • not provided
  • conflicting data from submitters
  • Pathogenic/Likely pathogenic
  • association
  • Conflicting classifications of pathogenicity
  • Pathogenic; risk factor
  • risk factor
  • other
  • drug response
  • Uncertain significance; Pathogenic/Likely pathogenic
  • Likely pathogenic, low penetrance
  • Pathogenic; Affects
  • Pathogenic, low penetrance
  • protective
  • Affects
  • Benign; other
  • Conflicting classifications of pathogenicity; other
  • Conflicting classifications of pathogenicity; association
  • Uncertain risk allele
  • Uncertain significance; risk factor
  • Likely pathogenic; risk factor
  • Likely benign; association
  • Likely risk allele
  • Pathogenic/Likely pathogenic; other
  • Pathogenic; other
  • Pathogenic/Likely pathogenic/Pathogenic, low penetrance
  • Pathogenic/Likely pathogenic; risk factor
  • Benign/Likely benign; risk factor
  • Uncertain significance/Uncertain risk allele
  • Pathogenic; association; protective
  • protective; risk factor
  • Benign/Likely benign; other; risk factor
  • Benign/Likely benign; association
  • Benign; association
  • Affects; association; other
  • Pathogenic; protective
  • Conflicting classifications of pathogenicity; drug response; other
  • Conflicting classifications of pathogenicity; drug response
  • Benign; drug response
  • Likely pathogenic; other
  • Conflicting classifications of pathogenicity; protective
  • Pathogenic/Likely pathogenic; drug response
  • Benign/Likely benign; other
  • Likely pathogenic/Likely risk allele
  • Uncertain risk allele; protective
  • association not found
  • Affects; association
  • Uncertain significance; association
  • Likely benign; other
  • Uncertain significance; other
  • Conflicting classifications of pathogenicity; association; risk factor Pathogenic;
  • association
  • Benign; risk factor
  • Conflicting classifications of pathogenicity; other; risk factor
  • Pathogenic/Likely risk allele; risk factor
  • Uncertain significance; drug response
  • Conflicting classifications of pathogenicity; risk factor
  • other; risk factor
  • Pathogenic/Likely pathogenic/Likely risk allele
  • Likely pathogenic; drug response
  • Conflicting classifications of pathogenicity; Affects
  • association; drug response; risk factor
  • Pathogenic; drug response
  • Affects; risk factor
  • Pathogenic; drug response; other
  • Likely pathogenic; protective
  • confers sensitivity
  • Likely pathogenic; association
  • Benign; Affects
  • Likely pathogenic; Affects
  • Uncertain risk allele; risk factor
  • drug response; risk factor
  • Pathogenic/Likely risk allele
  • Likely benign; drug response; other
  • Benign/Likely benign; drug response
  • Benign/Likely benign; drug response; other
  • drug response; other
  • association; drug response
  • Pathogenic; confers sensitivity
  • association; risk factor
  • Pathogenic/Pathogenic, low penetrance; other
  • Benign; confers sensitivity
  • confers sensitivity; other
  • Likely pathogenic/Pathogenic, low penetrance
  • Likely benign; risk factor

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands clinvar. +The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using clinvar subcommands and source data files

Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

ClinVarVCVRelease_00-latest.xml.gz
ClinVarVCVRelease_00-latest.xml.gz.version

The version file is a json file with the following format.

{
"name": "ClinVar",
"version": "20240501",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate": "2024-05-01"
}

You have to adjust the version and release date according to the actual date of the ClinVar.

Here is a sample execution:

dotnet SAUtils ClinVarPreview \
--r ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat\
--vcv ClinVarVCVRelease_00-latest.xml.gz\
--o output
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Parsing XML completed in 14.7 mins.
Sorting and adjusting completed in 4.7 mins.
Writing 2351609 Small Varaints
Chromosome 1 completed in 00:00:57.1
Chromosome 2 completed in 00:01:30.8
Chromosome 3 completed in 00:00:32.9
Chromosome 4 completed in 00:00:21.2
Chromosome 5 completed in 00:00:31.7
Chromosome 6 completed in 00:00:34.6
Chromosome 7 completed in 00:00:27.9
Chromosome 8 completed in 00:00:17.9
Chromosome 9 completed in 00:00:34.0
Chromosome 10 completed in 00:00:26.6
Chromosome 11 completed in 00:00:35.4
Chromosome 12 completed in 00:00:31.5
Chromosome 13 completed in 00:00:22.7
Chromosome 14 completed in 00:00:22.7
Chromosome 15 completed in 00:00:23.7
Chromosome 16 completed in 00:00:39.6
Chromosome 17 completed in 00:00:46.7
Chromosome 18 completed in 00:00:10.2
Chromosome 19 completed in 00:00:32.9
Chromosome 20 completed in 00:00:10.7
Chromosome 21 completed in 00:00:05.3
Chromosome 22 completed in 00:00:11.0
Chromosome X completed in 00:00:19.6
Chromosome Y completed in 00:00:00.1
Chromosome MT completed in 00:00:00.3
Maximum bp shifted for any variant:1
NSA writing completed in 11.5 mins.
Writing 76122 Large Varaints
Writing 76122 intervals to database...
NSI writing completed in 1.1 mins.

Time: 00:32:10.9
Process finished with exit code 0.


+ + + + \ No newline at end of file diff --git a/3.24/data-sources/clinvar/index.html b/3.24/data-sources/clinvar/index.html new file mode 100644 index 00000000..eec26b58 --- /dev/null +++ b/3.24/data-sources/clinvar/index.html @@ -0,0 +1,21 @@ + + + + + + + +ClinVar | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

ClinVar

Overview

Deprecated

ClinVar has changed to a new XML format +Use CliVarPreview for latest ClinVar entries.

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

RCV File

Example

Here's a full RCV entry.

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

ID

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinVarAccession Acc="RCV000000001" Version="2">
</ClinVarSet>

The Acc and Version fields are merged to form the ID (RCV000000001.2)

LastUpdatedDate

<ClinVarSet>
<ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
</ClinVarSet>

Significance

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

ReviewStatus

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

Phenotypes

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="62">
<Trait Type="Disease">
<Name>
<ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
</Name>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

We only use the field with Type="Preferred". Multiple phenotypes may be reported

Location, Variant Type and Variant Id

<ReferenceClinVarAssertion>
<GenotypeSet Type="CompoundHeterozygote" ID="424709">
<MeasureSet Type="Variant" ID="81">
<Measure Type="single nucleotide variant" ID="15120">
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
</Measure>
</MeasureSet>
</GenotypeSet>
</ReferenceClinVarAssertion>
  • The variant position is extracted from the fields for their respective assemblies.
  • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
  • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
  • If a required allele is not available, we extract it from the reference sequence.
  • Only variants having a dbSNP id are extracted.
  • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
  • VariantId is extracted from the MeasureSet attributes.
  • VariantType is extracted from the Measure attributes.
    unsupported variant types

    We currently don't support the following variant types:

    • Microsatellite
    • protein only
    • fusion
    • Complex
    • Variation
    • Translocation

MedGen, OMIM, Orphanet IDs

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="175">
<Trait ID="3036" Type="Disease">
<XRef ID="C0086651" DB="MedGen"/>
<XRef ID="309297" DB="Orphanet"/>
<XRef ID="582" DB="Orphanet"/>
<XRef Type="MIM" ID="253000" DB="OMIM"/>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

AlleleOrigins

<ClinVarAssertion>
<Origin>germline</Origin>
</ClinVarAssertion>

We only extract all Allele Origins from Submissions (SCV) entries.

PubMedIds

<ClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<Citation Type="general">
<ID Source="PubMed">12114475</ID>
</Citation>
</ClinicalSignificance>
<AttributeSet>
<Attribute Type="AssertionMethod">LMM Criteria</Attribute>
<Citation>
<ID Source="PubMed">24033266</ID>
</Citation>
</AttributeSet>
<ObservedIn>
<ObservedData ID="9727445">
<Citation Type="general">
<ID Source="PubMed">9113933</ID>
</Citation>
</ObservedData>
</ObservedIn>
<Citation Type="general">
<ID Source="PubMed">23757202</ID>
</Citation>
</ClinVarAssertion>

We only extract all Pubmed Ids from Submissions (SCV) entries.

Parsing Significance

Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2016-10-13">
<ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
<Description>Pathogenic/Likely pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2012-06-07">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Conflicting interpretations of pathogenicity</Description>
<Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
</ClinicalSignificance>

Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

Varying Delimiters

The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

VCV File

Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
<RecordStatus>current</RecordStatus>
<Species>Homo sapiens</Species>
<IncludedRecord>
<SimpleAllele AlleleID="425239" VariationID="431749">
<GeneList>
<Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
</Location>
<OMIM>601142</OMIM>
</Gene>
<Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
</Location>
<OMIM>607215</OMIM>
</Gene>
</GeneList>
<Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
<VariantType>copy number gain</VariantType>
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<XRefList>
<XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
</XRefList>
</SimpleAllele>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<SubmittedInterpretationList>
<SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
</SubmittedInterpretationList>
<InterpretedVariationList>
<InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
</InterpretedVariationList>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

id

<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

The Acc and Version fields are merged to form the ID (RCV000000001.2)

significance

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<SimpleAllele>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
</SimpleAllele>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

May have multiple significances listed.

reviewStatus

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Known Issues

Known Issues
  • The XML file contains ~1k more entries (out of 162K) than the VCF file
  • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
  • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", +etc.) as their alternate allele

Download URLs

ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz

https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz

JSON Output

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands clinvar. +The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using clinvar subcommands and source data files

Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

ClinVarFullRelease_00-latest.xml.gz     ClinVarVariationRelease_00-latest.xml.gz
ClinVarFullRelease_00-latest.xml.gz.version

The version file is a json file with the following format.

{
"name": "ClinVar",
"version": "20231230",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate": "2024-01-10"
}

You have to adjust the version and release date according to the actual date of the ClinVar.

The help menu for the utility is as follows:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2022 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll clinvar

Here is a sample execution:

dotnet SAUtils.dll clinvar \\
--ref ~/development/References/7/Homo_sapiens.GRCh38.Nirvana.dat --rcv ClinVarFullRelease_00-latest.xml.gz \\
--vcv ClinVarVariationRelease_00-latest.xml.gz --out ~/development/SupplementaryDatabase/63/GRCh38
---------------------------------------------------------------------------
SAUtils (c) 2022 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
---------------------------------------------------------------------------

Found 1535677 VCV records
Unknown vcv id:225946 found in RCV000211201.2
Unknown vcv id:225946 found in RCV000211253.2
Unknown vcv id:225946 found in RCV000211375.2
Unknown vcv id:976117 found in RCV001253316.1
Unknown vcv id:1321016 found in RCV001776995.2
3 unknown VCVs found in RCVs.
225946,976117,1321016
0 unknown VCVs found in RCVs.
Chromosome 1 completed in 00:00:15.1
Chromosome 2 completed in 00:00:20.0
Chromosome 3 completed in 00:00:09.7
Chromosome 4 completed in 00:00:05.9
Chromosome 5 completed in 00:00:09.8
Chromosome 6 completed in 00:00:08.3
Chromosome 7 completed in 00:00:08.7
Chromosome 8 completed in 00:00:06.2
Chromosome 9 completed in 00:00:08.6
Chromosome 10 completed in 00:00:07.0
Chromosome 11 completed in 00:00:11.7
Chromosome 12 completed in 00:00:08.0
Chromosome 13 completed in 00:00:06.3
Chromosome 14 completed in 00:00:06.0
Chromosome 15 completed in 00:00:06.6
Chromosome 16 completed in 00:00:10.8
Chromosome 17 completed in 00:00:13.8
Chromosome 18 completed in 00:00:02.9
Chromosome 19 completed in 00:00:08.7
Chromosome 20 completed in 00:00:03.6
Chromosome 21 completed in 00:00:02.4
Chromosome 22 completed in 00:00:03.6
Chromosome MT completed in 00:00:00.2
Chromosome X completed in 00:00:07.5
Chromosome Y completed in 00:00:00.0
Maximum bp shifted for any variant:2
Writing 37097 intervals to database...

Time: 00:13:26.9

+ + + + \ No newline at end of file diff --git a/3.24/data-sources/cosmic-cancer-gene-census/index.html b/3.24/data-sources/cosmic-cancer-gene-census/index.html new file mode 100644 index 00000000..8a21bc04 --- /dev/null +++ b/3.24/data-sources/cosmic-cancer-gene-census/index.html @@ -0,0 +1,18 @@ + + + + + + + +cosmic-cancer-gene-census | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/cosmic-gene-fusion-json/index.html b/3.24/data-sources/cosmic-gene-fusion-json/index.html new file mode 100644 index 00000000..79923f13 --- /dev/null +++ b/3.24/data-sources/cosmic-gene-fusion-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +cosmic-gene-fusion-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/cosmic-json/index.html b/3.24/data-sources/cosmic-json/index.html new file mode 100644 index 00000000..c5b88a1a --- /dev/null +++ b/3.24/data-sources/cosmic-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +cosmic-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/cosmic/index.html b/3.24/data-sources/cosmic/index.html new file mode 100644 index 00000000..058d9b89 --- /dev/null +++ b/3.24/data-sources/cosmic/index.html @@ -0,0 +1,28 @@ + + + + + + + +COSMIC | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human +cancers.

Publication

John G Tate, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, Charlotte G Cole, Celestino Creatore, Elisabeth Dawson, +Peter Fish, Bhavana Harsha, Charlie Hathaway, Steve C Jupe, Chai Yin Kok, Kate Noble, Laura Ponting, Christopher C Ramshaw, Claire E Rye, Helen E Speedy, Ray +Stefancsik, Sam L Thompson, Shicai Wang, Sari Ward, Peter J Campbell, Simon A Forbes. (2019) COSMIC: the Catalogue Of Somatic Mutations In +Cancer, Nucleic Acids Research, Volume 47, Issue D1

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Small Variants

Our main COSMIC deliverable provides annotations for both coding and non-coding variants throughout the genome. As of COSMIC v96, this includes 28.7M variants +spanning the human genome. Illumina Connected Annotations currently parses four files to extract the relevant content:

  • CosmicCodingMuts.vcf.gz
  • CosmicNonCodingVariants.vcf.gz
  • CosmicMutantExport.tsv.gz
  • CosmicNCV.tsv.gz

VCF extraction

Example

#CHROM  POS ID  REF ALT QUAL  FILTER  INFO
1 65797 COSV58737189 T C . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN23957695;CDS=c.9+224T>C;AA=p.?;HGVSC=ENST00000641515.2:c.9+224T>C;HGVSG=1:g.65797T>C;CNT=1

Parsing

From the VCF files, we're mainly interested in the following columns:

  • CHROM
  • POS
  • ID
  • REF
  • ALT

TSV extraction

Example

Gene name Accession Number  Gene CDS length HGNC ID Sample name ID_sample ID_tumour Primary site  Site subtype 1  Site subtype 2  Site subtype 3  Primary histology Histology subtype 1 Histology subtype 2 Histology subtype 3 Genome-wide screen  GENOMIC_MUTATION_ID LEGACY_MUTATION_ID  MUTATION_ID Mutation CDS  Mutation AA Mutation Description  Mutation zygosity LOH GRCh  Mutation genome position  Mutation strand Resistance Mutation Mutation somatic status Pubmed_PMID ID_STUDY  Sample Type Tumour origin Age HGVSP HGVSC HGVSG
MCF2L_ENST00000375604 ENST00000375604.6 3372 14576 RK091_C01 1918867 1806188 liver NS NS NS carcinoma NS NS NS y COSV65049364 COSN1601909 113108365 c.73+3096A>G p.? Unknown het 38 13:113005079-113005079 + - Variant of unknown origin 322 fresh/frozen - NOS primary ENST00000375604.6:c.73+3096A>G 13:g.113005079A>G

Parsing

From the TSV file, we're mainly interested in the following columns:

  • GENOMIC_MUTATION_ID
  • ID_sample
  • Primary site
  • Site subtype 1
  • Primary histology
  • Histology subtype 1
  • Pubmed_PMID
  • Resistance Mutation
  • Mutation somatic status
info

For all the histologies and sites, we replace all the underlines with spaces. salivary_gland would become salivary gland.

Parsing

To aggregate the data in Illumina Connected Annotations, we perform the following:

  • Parse the coding and non-coding TSV files to retrieve the histologies, sites, PubMed IDs, somatic status, and resistance mutation status. Histologies and sites +are tracked with respect to sample IDs.
  • Parse the coding and non-coding VCF files to retrieve the genomic variant for each entry

Aggregating Histologies & Sites

For sites and histologies, we observe that the subtype provides additional description but is still dependent on the primary site value. For example, the primary +site might be skin, but the subtype is foot. Therefore, we will combine the values in the following manner: skin (foot).

COSMIC uses NS to show that a value is empty. If the subtype is NS, we will use the primary histology instead.

Download URL

GRCh37

GRCh38

JSON Output

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint

Gene Fusions

Gene fusions are manually curated from peer reviewed publications by expert COSMIC curators. A comprehensive literature curation is completed for each fusion +pair when it is released in the database. Currently COSMIC includes information on fusions involved in solid tumours and leukaemias.

TSV extraction

Example

SAMPLE_ID SAMPLE_NAME PRIMARY_SITE  SITE_SUBTYPE_1  SITE_SUBTYPE_2  SITE_SUBTYPE_3  PRIMARY_HISTOLOGY HISTOLOGY_SUBTYPE_1 HISTOLOGY_SUBTYPE_2 HISTOLOGY_SUBTYPE_3 FUSION_ID TRANSLOCATION_NAME  5'_CHROMOSOME 5'_STRAND 5'_GENE_ID  5'_GENE_NAME  5'_LAST_OBSERVED_EXON 5'_GENOME_START_FROM  5'_GENOME_START_TO  5'_GENOME_STOP_FROM 5'_GENOME_STOP_TO 3'_CHROMOSOME 3'_STRAND 3'_GENE_ID  3'_GENE_NAME  3'_FIRST_OBSERVED_EXON  3'_GENOME_START_FROM  3'_GENOME_START_TO  3'_GENOME_STOP_FROM 3'_GENOME_STOP_TO FUSION_TYPE PUBMED_PMID
749711 HCC1187 breast NS NS NS carcinoma ductal_carcinoma NS NS 665 ENST00000360863.10(RGS22):r.1_3555::ENST00000369518.1(SYCP1):r.2100_3452 8 - 197199 RGS22 22 99981937 99981937 100106116 100106116 1 + 212470 SYCP1_ENST00000369518 24 114944339 114944339 114995367 114995367 Inferred Breakpoint 20033038

Parsing

From the TSV file, we're mainly interested in the following columns:

  • SAMPLE_ID
  • PRIMARY_SITE
  • PRIMARY_HISTOLOGY
  • HISTOLOGY_SUBTYPE_1
  • FUSION_ID
  • TRANSLOCATION_NAME
  • PUBMED_PMID
info

For all the histologies and sites, we replace all the underlines with spaces. salivary_gland would become salivary gland.

Parsing

To create the gene fusion entries in Illumina Connected Annotations, we perform the following on each row in the TSV file:

  • Group all entries by FUSION_ID
  • Using all the entries related to this FUSION_ID:
    • Collect all the PubMed IDs
    • Tally the number of observed sample IDs
    • Grab the HGVS r. notation (should not change throughout the FUSION_ID)
    • Tally the number of samples observed for each histology
    • Tally the number of samples observed for each site
  • Extract the transcript IDs from the HGVS notation and lookup the associated gene symbols

Aggregating Histologies & Sites

Aggregating Histologies & Sites was previously described in the small variants section.

Known Issues

Known Issues

There are some issues with the HGVS RNA notation:

  • For coding transcripts, HGVS numbering should use CDS coordinates. Right now COSMIC is using cDNA coordinates for all their fusions.

Download URL

GRCh37

GRCh38

JSON Output

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint

Cancer Gene Census

TSV Extraction

Example

GENE_NAME       CELL_TYPE       PUBMED_PMID     HALLMARK        IMPACT  DESCRIPTION     CELL_LINE
PRDM16 18496560 role in cancer oncogene oncogene
PRDM16 16015645 role in cancer fusion fusion

Parsing

To extract information about TSGs and oncogenes, the data based on the "role in cancer" attribute is filtered. +For tumor suppressor genes, rows with the value "TSG" and for oncogenes, rows with the value "oncogene" are filtered. +Some genes have both "TSG/oncogene" as their role, which indicates that they can act as both.

Columns

Only following columns are needed to gather required roles in cancer:

  • GENE_NAME
  • IMPACT
  • HALLMARK
Possible Roles in Cancer

The file contained following number of instances for each role type

Role in cancerTotal Instances
fusion149
TSG195
oncogene181
Total525

CSV Extraction

COSMIC Tiers are extracted from cancer_gene_census.csv file:

Gene Symbol,Name,Entrez GeneId,Genome Location,Tier,Hallmark,Chr Band,Somatic,Germline,Tumour Types(Somatic),Tumour Types(Germline),Cancer Syndrome,Tissue Type,Molecular Genetics,Role in Cancer,Mutation Types,Translocation Partner,Other Germline Mut,Other Syndrome,COSMIC ID,cosmic gene name,Synonyms
"AR","Androgen Receptor ","367","X:67544036-67730619","1","Yes","Xq12","yes","yes","prostate","","","E","Dom","oncogene","Mis","","yes ","Androgen insensitivity, Hypospadias 1, X-linked, Spinal and bulbar muscular atrophy of Kennedy ","COSG292497","AR","367,AIS,AR,DHTR,ENSG00000169083.16,HUMARA,NR3C4,P10275,SBMA,SMAX1"
"FH","fumarate hydratase","2271","1:241497603-241519761","1","","1q43","","yes","","leiomyomatosis, renal","hereditary leiomyomatosis and renal cell cancer","E, M","Rec","TSG","Mis, N, F","","","","COSG255037","FH","2271,ENSG00000091483.6,FH,P07954"
"ALK","anaplastic lymphoma kinase (Ki-1)","238","2:29192774-29921566","1","Yes","2p23.2","yes","yes","ALCL, NSCLC, neuroblastoma, inflammatory myofibroblastic tumour, Spitzoid tumour","neuroblastoma","familial neuroblastoma","L, E, M","Dom","oncogene, fusion","T, Mis, A","NPM1, TPM3, TFG, TPM4, ATIC, CLTC, MSN, RNF213, CARS, EML4, KIF5B, C2orf22, DCTN1, HIP1, TPR, RANBP2, PPFIBP1, SEC31A, STRN, VCL, C2orf44, KLC1","","","COSG383409","ALK","238,ALK,CD246,ENSG00000171094.17,Q9UM73"
"APC","adenomatous polyposis of the colon gene","324","5:112737888-112846239","1","Yes","5q22.2","yes","yes","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","adenomatous polyposis coli; Turcot syndrome","E, M, O","Rec","TSG","D, Mis, N, F, S","","","","COSG208824","APC","324,APC,DP2,DP2.5,DP3,ENSG00000134982.16,P25054,PPP1R46"
Columns

Only following columns are needed to gather required roles in cancer:

  • Gene Symbol
  • Tier

First the tiers are found from the CSV; based on gene symbols, the tiers' information is added while parsing through the TSV

Known Issues

None

Download URL

JSON output

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]

Building the supplementary files

You can generate COSMIC supplementary annotation files if you have COSMIC account credentials. Please refer to SAUtils section for more details.

+ + + + \ No newline at end of file diff --git a/3.24/data-sources/dann-json/index.html b/3.24/data-sources/dann-json/index.html new file mode 100644 index 00000000..6dc700cd --- /dev/null +++ b/3.24/data-sources/dann-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +dann-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/dann/index.html b/3.24/data-sources/dann/index.html new file mode 100644 index 00000000..b35e0749 --- /dev/null +++ b/3.24/data-sources/dann/index.html @@ -0,0 +1,21 @@ + + + + + + + +DANN | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). +CADD is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. +DANN improves on CADD (which uses Support Vector Machines (SVMs)) by capturing non-linear relationships by using a deep neural network instead of SVMs. +DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD’s SVM methodology.

Publication

Quang, Daniel, Yifei Chen, and Xiaohui Xie. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31.5 761-763 (2015). https://doi.org/10.1093/bioinformatics/btu703

TSV File

Example

chr     grch37_pos  ref     alt     DANN
1 10001 T A 0.16461391399220135
1 10001 T C 0.4396994049749739
1 10001 T G 0.38108629377072734
1 10002 A C 0.36182020272810128
1 10002 A G 0.44413258111779291
1 10002 A T 0.16812846819989813

Parsing

From the CSV file, we are interested in all columns:

  • chr
  • grch37_pos
  • ref
  • alt
  • DANN

GRCh38 liftover

The data is not available for GRCh38 on DANN website. We performed a liftover from GRCh37 to GRCh38 using crossmap.

Known Issues

None

Download URL

https://cbcl.ics.uci.edu/public_data/DANN/

JSON Output

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/dbsnp-json/index.html b/3.24/data-sources/dbsnp-json/index.html new file mode 100644 index 00000000..20670130 --- /dev/null +++ b/3.24/data-sources/dbsnp-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +dbsnp-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/dbsnp/index.html b/3.24/data-sources/dbsnp/index.html new file mode 100644 index 00000000..2a155c86 --- /dev/null +++ b/3.24/data-sources/dbsnp/index.html @@ -0,0 +1,18 @@ + + + + + + + +dbSNP | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

Building the supplementary files

You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details.

+ + + + \ No newline at end of file diff --git a/3.24/data-sources/decipher-json/index.html b/3.24/data-sources/decipher-json/index.html new file mode 100644 index 00000000..01f20acc --- /dev/null +++ b/3.24/data-sources/decipher-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +decipher-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/decipher/index.html b/3.24/data-sources/decipher/index.html new file mode 100644 index 00000000..99f114dc --- /dev/null +++ b/3.24/data-sources/decipher/index.html @@ -0,0 +1,19 @@ + + + + + + + +DECIPHER | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz +https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz

JSON output

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/fusioncatcher-json/index.html b/3.24/data-sources/fusioncatcher-json/index.html new file mode 100644 index 00000000..6ff1c35e --- /dev/null +++ b/3.24/data-sources/fusioncatcher-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +fusioncatcher-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/fusioncatcher/index.html b/3.24/data-sources/fusioncatcher/index.html new file mode 100644 index 00000000..7ee41c6c --- /dev/null +++ b/3.24/data-sources/fusioncatcher/index.html @@ -0,0 +1,18 @@ + + + + + + + +FusionCatcher | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gerp-json/index.html b/3.24/data-sources/gerp-json/index.html new file mode 100644 index 00000000..e2678d7b --- /dev/null +++ b/3.24/data-sources/gerp-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gerp-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gerp/index.html b/3.24/data-sources/gerp/index.html new file mode 100644 index 00000000..ffa28906 --- /dev/null +++ b/3.24/data-sources/gerp/index.html @@ -0,0 +1,20 @@ + + + + + + + +GERP | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. +These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint (Rejected Substitutions). +Illumina Connected Annotations uses GERP++ which is based on a significantly faster and more statistically robust maximum likelihood estimation procedure to compute expected rates of evolution.

Publication

Davydov, Eugene V., et al. "Identifying a high fraction of the human genome to be under selective constraint using GERP++." PLoS computational biology 6.12 e1001025 (2010). https://doi.org/10.1371/journal.pcbi.1001025

Source Files

Example GRCh37

GRCh37 file is a TSV format

chr     position    GERP
1 12177 0.83
1 12178 -0.206
1 12179 -0.492
1 12180 -1.66
1 12181 0.83
1 12182 0.83
1 12183 -0.417
1 12184 0.83

Example GRCh38

GRCh38 file is a lift-over BED format

chr     pos_start   pos_end     GERP
1 12646 12647 0.298
1 12647 12648 2.63
1 12648 12649 1.87
1 12649 12650 0.252
1 12650 12651 -2.06
1 12651 12652 2.61
1 12652 12653 3.97

Parsing

From the CSV file, we are interested in columns:

  • chr
  • position
  • GERP

Known Issues

None

Download URL

GRCh37

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html

GRCh38

The data is not available for GRCh38 on GERP++ website, and was obtained from https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/

JSON Output

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gme-json/index.html b/3.24/data-sources/gme-json/index.html new file mode 100644 index 00000000..60f08567 --- /dev/null +++ b/3.24/data-sources/gme-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gme-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gme/index.html b/3.24/data-sources/gme/index.html new file mode 100644 index 00000000..cdb38620 --- /dev/null +++ b/3.24/data-sources/gme/index.html @@ -0,0 +1,18 @@ + + + + + + + +GME Variome | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad-lof-json/index.html b/3.24/data-sources/gnomad-lof-json/index.html new file mode 100644 index 00000000..b5cf2189 --- /dev/null +++ b/3.24/data-sources/gnomad-lof-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad-lof-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad-small-variants-json/index.html b/3.24/data-sources/gnomad-small-variants-json/index.html new file mode 100644 index 00000000..89620ad9 --- /dev/null +++ b/3.24/data-sources/gnomad-small-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad-small-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad-structural-variants-data_description/index.html b/3.24/data-sources/gnomad-structural-variants-data_description/index.html new file mode 100644 index 00000000..2a166293 --- /dev/null +++ b/3.24/data-sources/gnomad-structural-variants-data_description/index.html @@ -0,0 +1,19 @@ + + + + + + + +gnomad-structural-variants-data_description | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type Key
copy_number_variation
deletionDEL, CN=0
duplicationDUP
insertionINS
inversionINV
mobile_element_insertionINS:ME
mobile_element_insertionINS:ME:ALU
mobile_element_insertionINS:ME:LINE1
mobile_element_insertionINS:ME:SVA
structural alteration
complex_structural_alterationCPX
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad-structural-variants-json/index.html b/3.24/data-sources/gnomad-structural-variants-json/index.html new file mode 100644 index 00000000..6355d038 --- /dev/null +++ b/3.24/data-sources/gnomad-structural-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad-structural-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad/index.html b/3.24/data-sources/gnomad/index.html new file mode 100644 index 00000000..19201f7b --- /dev/null +++ b/3.24/data-sources/gnomad/index.html @@ -0,0 +1,33 @@ + + + + + + + +gnomAD | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Illumina Connected Analysis will support gnomAD v4.0 for GRCh38 assembly and gnomAD v2.1 for GRCh37.

gnomAD v4.0 (GRCh38)

Small Variants

In gnomAD v4.0, like gnomAD v2.1, there are genome and exome data. Compare to gnomAD v2.1 which the data for genome and exome are merged, for gnomAD 4.0, Illumina Connected Annotation will separate them with different JSON output field. +For gnomAD genome, the field name would be gnomad. For gnomAD exome, the field name would be gnomad-exome. +Despite this difference in the field name, the JSON data format would be identical for both genome and exome.

VCF extraction

We currently extract the following info fields from both gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles">
##INFO=<ID=AC_XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples">
##INFO=<ID=AN_XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples">
##INFO=<ID=nhomalt_XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples">
##INFO=<ID=AC_XY,Number=A,Type=Integer,Description="Alternate allele count for XY samples">
##INFO=<ID=AN_XY,Number=1,Type=Integer,Description="Total number of alleles in XY samples">
##INFO=<ID=nhomalt_XY,Number=A,Type=Integer,Description="Count of homozygous individuals in XY samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African/African-American ancestry">
##INFO=<ID=AN_afr,Number=1,Type=Integer,Description="Total number of alleles in samples of African/African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African/African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=1,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=1,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=1,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=1,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_mid,Number=A,Type=Integer,Description="Alternate allele count for samples of Middle Eastern ancestry">
##INFO=<ID=AN_mid,Number=1,Type=Integer,Description="Total number of alleles in samples of Middle Eastern ancestry">
##INFO=<ID=nhomalt_mid,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Middle Eastern ancestry">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of Non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=1,Type=Integer,Description="Total number of alleles in samples of Non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Non-Finnish European ancestry">
##INFO=<ID=AC_remaining,Number=A,Type=Integer,Description="Alternate allele count for samples of Remaining individuals ancestry">
##INFO=<ID=AN_remaining,Number=1,Type=Integer,Description="Total number of alleles in samples of Remaining individuals ancestry">
##INFO=<ID=nhomalt_remaining,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Remaining individuals ancestry">
##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=1,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">

JSON output

"gnomad": {
"coverage": 154,
"failedFilter": true,
"allAf": 0.5,
"allAn": 152428,
"allAc": 76214,
"allHc": 0,
"afrAf": 0.5,
"afrAn": 41608,
"afrAc": 20804,
"afrHc": 0,
"amiAf": 0.5,
"amiAn": 912,
"amiAc": 456,
"amiHc": 0,
"amrAf": 0.5,
"amrAn": 15314,
"amrAc": 7657,
"amrHc": 0,
"easAf": 0.5,
"easAn": 5196,
"easAc": 2598,
"easHc": 0,
"finAf": 0.5,
"finAn": 10632,
"finAc": 5316,
"finHc": 0,
"nfeAf": 0.5,
"nfeAn": 68050,
"nfeAc": 34025,
"nfeHc": 0,
"asjAf": 0.5,
"asjAn": 3472,
"asjAc": 1736,
"asjHc": 0,
"sasAf": 0.5,
"sasAn": 4834,
"sasAc": 2417,
"sasHc": 0,
"midAf": 0.5,
"midAn": 294,
"midAc": 147,
"midHc": 0,
"remainingAf": 0.5,
"remainingAn": 2116,
"remainingAc": 1058,
"remainingHc": 0,
"maleAf": 0.5,
"maleAn": 74544,
"maleAc": 37272,
"maleHc": 0,
"femaleAf": 0.5,
"femaleAn": 77884,
"femaleAc": 38942,
"femaleHc": 0
}
"gnomad-exome": {
"coverage": 53,
"allAf": 0.495074,
"allAn": 4060,
"allAc": 2010,
"allHc": 11,
"afrAf": 0.5,
"afrAn": 86,
"afrAc": 43,
"afrHc": 0,
"amrAf": 0.5,
"amrAn": 46,
"amrAc": 23,
"amrHc": 0,
"easAf": 0.491071,
"easAn": 112,
"easAc": 55,
"easHc": 0,
"finAf": 0.5,
"finAn": 306,
"finAc": 153,
"finHc": 0,
"nfeAf": 0.49503,
"nfeAn": 3018,
"nfeAc": 1494,
"nfeHc": 11,
"asjAf": 0.461538,
"asjAn": 26,
"asjAc": 12,
"asjHc": 0,
"sasAf": 0.486111,
"sasAn": 72,
"sasAc": 35,
"sasHc": 0,
"midAf": 0.5,
"midAn": 68,
"midAc": 34,
"midHc": 0,
"remainingAf": 0.493865,
"remainingAn": 326,
"remainingAc": 161,
"remainingHc": 0,
"maleAf": 0.495212,
"maleAn": 2924,
"maleAc": 1448,
"maleHc": 9,
"femaleAf": 0.494718,
"femaleAn": 1136,
"femaleAc": 562,
"femaleHc": 2
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
maleAffloatallele frequency for male population. Range: 0 - 1.0
maleAnintallele number for male population. Non-zero integer.
maleAcintallele count for male population. Integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleAffloatallele frequency for female population. Range: 0 - 1.0
femaleAnintallele number for female population. Non-zero integer.
femaleAcintallele count for female population. Integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
remainingAffloatallele frequency for the Other population. Range: 0 - 1.0
remainingAcintallele count for the Other population. Integer.
remainingAnintallele number for the Other population. Non-zero integer.
remainingHcintcount of homozygous individuals for Other population. Non-negative integer
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAnintallele number for all populations. Non-zero integer.
allAcintallele count for all populations. Integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amiAffloatallele frequency for Amish populations. Range: 0 - 1.0
amiAnintallele number for Amish populations. Non-zero integer.
amiAcintallele count for Amish populations. Integer.
amiHcintcount of homozygous individuals for Amish populations. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
midAffloatallele frequency for the Middle Eastern population. Range: 0 - 1.0
midAcintallele count for the iddle Eastern population Integer.
midAnintallele number for the iddle Eastern population. Non-zero integer.
midHcintcount of homozygous individuals for the iddle Eastern population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)

Calculation

To calculate allele frequency for each group, we divide the allele count with allele number for each group.

LoF Gene Metrics

In gnomAD 4.0, the gene score data for LOF is given per transcript. +Since this is gene level data, one of the transcripts need to be chosen and value reported. +The transcript ID of the selected transcript will be reported. +Transcripts are prioritized (from higher to lower) as follows:

  1. Ensembl Transcript has mane_select column true from source (gnomAD).
  2. Transcript is marked as Ensembl canonical in Illumina Connected Annotation cache data.
  3. RefSeq transcript has mane_select column true.
  4. Transcript is marked as RefSeq canonical in Illumina Connected Annotation cache data.
  5. Transcript has the lowest lof.oe_ci.upper value compare to other transcript for the same gene.
Differences with gnomAD browser

Due to difference in Ensembl version between Illumina Connected Annotation and gnomAD, there are several transcript ID that are marked as canonical in gnomAD browser but not in Illumina Connected Analysis. +If this is the case, the gene score shown in Illumina Connected Annotation will be different compared to the gene score shown in the gnomAD browser. +The transcriptId field in the JSON output will report which transcript was used by Illumina Connected Annotation.

Tab delimited file example

gene    transcript  mane_select lof_hc_lc.obs   lof_hc_lc.exp   lof_hc_lc.possible  lof_hc_lc.oe    lof_hc_lc.mu    lof_hc_lc.pLI   lof_hc_lc.pNull lof_hc_lc.pRec  lof.obs lof.exp lof.possible    lof.oe  lof.mu  lof.pLI lof.pNull   lof.pRec    lof.oe_ci.lower lof.oe_ci.upper lof.z_raw   lof.z_score mis.obs mis.exp mis.possible    mis.oe  mis.mu  mis.oe_ci.lower mis.oe_ci.upper mis.z_raw   mis.z_score mis_pphen.obs   mis_pphen.exp   mis_pphen.possible  mis_pphen.oe    syn.obs syn.exp syn.possible    syn.oe  syn.mu  syn.oe_ci.lower syn.oe_ci.upper syn.z_raw   syn.z_score constraint_flags
SCHIP1 ENST00000445224 false 8 3.0392e+01 157 2.6323e-01 3.5111e-07 9.9024e-01 5.8227e-06 9.7579e-03 8 3.0392e+01 157 2.6323e-01 3.5111e-07 9.9066e-01 5.3097e-06 9.3341e-03 1.5300e-01 4.7500e-01 4.0617e+00 3.4377e+00 193 3.0914e+02 1659 6.2431e-01 1.5780e-06 5.5400e-01 7.0300e-01 6.6055e+00 2.4115e+00 87 1.4959e+02 813 5.8160e-01 76 1.0011e+02 393 7.5914e-01 7.9269e-07 6.3000e-01 9.1900e-01 2.4099e+00 1.3142e+00 []

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLilof.pLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNulllof.pNullprobability of being completely tolerant of loss of function variation (observed = expected)
pReclof.pRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn.z_scorecorrected synonymous Z score
misZmis.z_scorecorrected missense Z score
loeuflof.oe_ci.upperloss of function observed/expected upper bound fraction (LOEUF)
transcriptIdtranscripttranscript ID which the values we select
"gnomAD": {
"pLi": 0.00000122,
"pRec": 0.32,
"pNull": 0.68,
"synZ": 0.0117,
"misZ": 0.162,
"loeuf": 1.94,
"transcriptId": "ENST00000360525"
}

Structural Variants

Structural variants in gnomAD 4.0 is available in VCF format and has the same population data as small variants.

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type Key
deletionDEL, CN=0
duplicationDUP
insertionINS
inversionINV
mobile_element_insertionINS:ME
mobile_element_insertionINS:ME:ALU
mobile_element_insertionINS:ME:LINE1
mobile_element_insertionINS:ME:SVA
complex_structural_alterationCPX
gnomAD Copy Number Variation

In gnomAD 4.0 structural variants data, there are CNV data in the VCF file. Since it is not shown in the browser, we don't include CNV in our output. +We will evaluate in the future whether to include copy number variation from structural variation data together with new rare CNV data taht is available in gnomAD 4.0.

gnomAD duplication variant type

In gnomAD 4.0 structural variants VCF, only DUP is shown as symbolic allele for duplication variant type. +Based on the information in gnomAD browser, duplication variant that has split read or paired end reads evidence can be inferred as tandem duplication. +With this, we check the evidence data in each DUP variants entry to decide whether it can be assign tandem duplication as variant type or it is just duplication.

JSON output

"gnomad": [
{
"chromosome": "1",
"begin": 1769047,
"end": 78686496,
"variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",
"variantType": "complex_structural_alteration",
"failedFilter": true,
"allAf": 0.51192,
"afrAf": 0.491986,
"amiAf": 0.559382,
"amrAf": 0.499444,
"asjAf": 0.505975,
"easAf": 0.51924,
"midAf": 0.53125,
"finAf": 0.542619,
"nfeAf": 0.521916,
"othAf": 0.492366,
"sasAf": 0.516568,
"femaleAf": 0.509225,
"maleAf": 0.514861,
"allAc": 64549,
"afrAc": 16637,
"amiAc": 471,
"amrAc": 6290,
"asjAc": 1609,
"easAc": 2105,
"midAc": 34,
"finAc": 3514,
"nfeAc": 30839,
"othAc": 774,
"sasAc": 2276,
"femaleAc": 33507,
"maleAc": 31042,
"allAn": 126092,
"afrAn": 33816,
"amiAn": 842,
"amrAn": 12594,
"asjAn": 3180,
"easAn": 4054,
"midAn": 64,
"finAn": 6476,
"nfeAn": 59088,
"othAn": 1572,
"sasAn": 4406,
"femaleAn": 65800,
"maleAn": 60292,
"allHc": 3167,
"afrHc": 413,
"amiHc": 54,
"amrHc": 238,
"asjHc": 49,
"easHc": 97,
"midHc": 2,
"finHc": 368,
"nfeHc": 1807,
"othHc": 23,
"sasHc": 116,
"femaleHc": 1407,
"maleHc": 1760
}
]

gnomAD v2.1 (GRCh37)

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note +The gnomAD structural variant annotations are in a preview stage at the moment. +Currently, the annotations do not include translocation breakends. +Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type Key
copy_number_variation
deletionDEL, CN=0
duplicationDUP
insertionINS
inversionINV
mobile_element_insertionINS:ME
mobile_element_insertionINS:ME:ALU
mobile_element_insertionINS:ME:LINE1
mobile_element_insertionINS:ME:SVA
structural alteration
complex_structural_alterationCPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

JSON output

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad4.0-lof-json/index.html b/3.24/data-sources/gnomad4.0-lof-json/index.html new file mode 100644 index 00000000..73f5b8cc --- /dev/null +++ b/3.24/data-sources/gnomad4.0-lof-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad4.0-lof-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad4.0-lof-json

"gnomAD": {
"pLi": 0.00000122,
"pRec": 0.32,
"pNull": 0.68,
"synZ": 0.0117,
"misZ": 0.162,
"loeuf": 1.94,
"transcriptId": "ENST00000360525"
}
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad4.0-small-variants-json/index.html b/3.24/data-sources/gnomad4.0-small-variants-json/index.html new file mode 100644 index 00000000..5cb2482e --- /dev/null +++ b/3.24/data-sources/gnomad4.0-small-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad4.0-small-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad4.0-small-variants-json

"gnomad": {
"coverage": 154,
"failedFilter": true,
"allAf": 0.5,
"allAn": 152428,
"allAc": 76214,
"allHc": 0,
"afrAf": 0.5,
"afrAn": 41608,
"afrAc": 20804,
"afrHc": 0,
"amiAf": 0.5,
"amiAn": 912,
"amiAc": 456,
"amiHc": 0,
"amrAf": 0.5,
"amrAn": 15314,
"amrAc": 7657,
"amrHc": 0,
"easAf": 0.5,
"easAn": 5196,
"easAc": 2598,
"easHc": 0,
"finAf": 0.5,
"finAn": 10632,
"finAc": 5316,
"finHc": 0,
"nfeAf": 0.5,
"nfeAn": 68050,
"nfeAc": 34025,
"nfeHc": 0,
"asjAf": 0.5,
"asjAn": 3472,
"asjAc": 1736,
"asjHc": 0,
"sasAf": 0.5,
"sasAn": 4834,
"sasAc": 2417,
"sasHc": 0,
"midAf": 0.5,
"midAn": 294,
"midAc": 147,
"midHc": 0,
"remainingAf": 0.5,
"remainingAn": 2116,
"remainingAc": 1058,
"remainingHc": 0,
"maleAf": 0.5,
"maleAn": 74544,
"maleAc": 37272,
"maleHc": 0,
"femaleAf": 0.5,
"femaleAn": 77884,
"femaleAc": 38942,
"femaleHc": 0
}
"gnomad-exome": {
"coverage": 53,
"allAf": 0.495074,
"allAn": 4060,
"allAc": 2010,
"allHc": 11,
"afrAf": 0.5,
"afrAn": 86,
"afrAc": 43,
"afrHc": 0,
"amrAf": 0.5,
"amrAn": 46,
"amrAc": 23,
"amrHc": 0,
"easAf": 0.491071,
"easAn": 112,
"easAc": 55,
"easHc": 0,
"finAf": 0.5,
"finAn": 306,
"finAc": 153,
"finHc": 0,
"nfeAf": 0.49503,
"nfeAn": 3018,
"nfeAc": 1494,
"nfeHc": 11,
"asjAf": 0.461538,
"asjAn": 26,
"asjAc": 12,
"asjHc": 0,
"sasAf": 0.486111,
"sasAn": 72,
"sasAc": 35,
"sasHc": 0,
"midAf": 0.5,
"midAn": 68,
"midAc": 34,
"midHc": 0,
"remainingAf": 0.493865,
"remainingAn": 326,
"remainingAc": 161,
"remainingHc": 0,
"maleAf": 0.495212,
"maleAn": 2924,
"maleAc": 1448,
"maleHc": 9,
"femaleAf": 0.494718,
"femaleAn": 1136,
"femaleAc": 562,
"femaleHc": 2
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
maleAffloatallele frequency for male population. Range: 0 - 1.0
maleAnintallele number for male population. Non-zero integer.
maleAcintallele count for male population. Integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleAffloatallele frequency for female population. Range: 0 - 1.0
femaleAnintallele number for female population. Non-zero integer.
femaleAcintallele count for female population. Integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
remainingAffloatallele frequency for the Other population. Range: 0 - 1.0
remainingAcintallele count for the Other population. Integer.
remainingAnintallele number for the Other population. Non-zero integer.
remainingHcintcount of homozygous individuals for Other population. Non-negative integer
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAnintallele number for all populations. Non-zero integer.
allAcintallele count for all populations. Integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amiAffloatallele frequency for Amish populations. Range: 0 - 1.0
amiAnintallele number for Amish populations. Non-zero integer.
amiAcintallele count for Amish populations. Integer.
amiHcintcount of homozygous individuals for Amish populations. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
midAffloatallele frequency for the Middle Eastern population. Range: 0 - 1.0
midAcintallele count for the iddle Eastern population Integer.
midAnintallele number for the iddle Eastern population. Non-zero integer.
midHcintcount of homozygous individuals for the iddle Eastern population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/gnomad40-structural-variants-json/index.html b/3.24/data-sources/gnomad40-structural-variants-json/index.html new file mode 100644 index 00000000..454b6eb7 --- /dev/null +++ b/3.24/data-sources/gnomad40-structural-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +gnomad40-structural-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

gnomad40-structural-variants-json

"gnomad": [
{
"chromosome": "1",
"begin": 1769047,
"end": 78686496,
"variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",
"variantType": "complex_structural_alteration",
"failedFilter": true,
"allAf": 0.51192,
"afrAf": 0.491986,
"amiAf": 0.559382,
"amrAf": 0.499444,
"asjAf": 0.505975,
"easAf": 0.51924,
"midAf": 0.53125,
"finAf": 0.542619,
"nfeAf": 0.521916,
"othAf": 0.492366,
"sasAf": 0.516568,
"femaleAf": 0.509225,
"maleAf": 0.514861,
"allAc": 64549,
"afrAc": 16637,
"amiAc": 471,
"amrAc": 6290,
"asjAc": 1609,
"easAc": 2105,
"midAc": 34,
"finAc": 3514,
"nfeAc": 30839,
"othAc": 774,
"sasAc": 2276,
"femaleAc": 33507,
"maleAc": 31042,
"allAn": 126092,
"afrAn": 33816,
"amiAn": 842,
"amrAn": 12594,
"asjAn": 3180,
"easAn": 4054,
"midAn": 64,
"finAn": 6476,
"nfeAn": 59088,
"othAn": 1572,
"sasAn": 4406,
"femaleAn": 65800,
"maleAn": 60292,
"allHc": 3167,
"afrHc": 413,
"amiHc": 54,
"amrHc": 238,
"asjHc": 49,
"easHc": 97,
"midHc": 2,
"finHc": 368,
"nfeHc": 1807,
"othHc": 23,
"sasHc": 116,
"femaleHc": 1407,
"maleHc": 1760
}
]
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/mito-heteroplasmy/index.html b/3.24/data-sources/mito-heteroplasmy/index.html new file mode 100644 index 00000000..d79e761a --- /dev/null +++ b/3.24/data-sources/mito-heteroplasmy/index.html @@ -0,0 +1,18 @@ + + + + + + + +Mitochondrial Heteroplasmy | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/mitomap-small-variants-json/index.html b/3.24/data-sources/mitomap-small-variants-json/index.html new file mode 100644 index 00000000..8a4916dd --- /dev/null +++ b/3.24/data-sources/mitomap-small-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +mitomap-small-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/mitomap-structural-variants-json/index.html b/3.24/data-sources/mitomap-structural-variants-json/index.html new file mode 100644 index 00000000..ccd6261b --- /dev/null +++ b/3.24/data-sources/mitomap-structural-variants-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +mitomap-structural-variants-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/mitomap/index.html b/3.24/data-sources/mitomap/index.html new file mode 100644 index 00000000..2a2fb10a --- /dev/null +++ b/3.24/data-sources/mitomap/index.html @@ -0,0 +1,18 @@ + + + + + + + +MITOMAP | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/omim-json/index.html b/3.24/data-sources/omim-json/index.html new file mode 100644 index 00000000..6b3cc722 --- /dev/null +++ b/3.24/data-sources/omim-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +omim-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/omim/index.html b/3.24/data-sources/omim/index.html new file mode 100644 index 00000000..097df17d --- /dev/null +++ b/3.24/data-sources/omim/index.html @@ -0,0 +1,23 @@ + + + + + + + +OMIM | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
+2 to disease phenotype itself was mapped
+3 to molecular basis of the disorder is known
+4 to disorder is a chromosome deletion or duplication syndrome

Phenotype character to comment

? to unconfirmed or possibly spurious mapping
+[/] to nondiseases
+{/} to contribute to susceptibility to multifactorial disorders or to susceptibility to infection

There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:

The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).

As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:

  • Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.
  • Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".
  • All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".
  • If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".

Here is a list of examples about how the description section supposed to be processed:

Original textProcessed text
({516030}, {516040}, and {516050})
(e.g., D1, {168461}; D2, {123833}; D3, {123834})(e.g., D1; D2; D3)
(desmocollins; see DSC2, {125645})(desmocollins; see DSC2)
(e.g., see {102700}, {300755})
(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650})(ADH). See also liver mitochondrial ALDH2
(see, e.g., CACNA1A; {601011})(see, e.g., CACNA1A)
(e.g., GSTA1; {138359}), mu (e.g., {138350})(e.g., GSTA1), mu
(NFKB; see {164011})(NFKB)
(see ISGF3G, {147574})(see ISGF3G)
(DCK; {EC 2.7.1.74}; {125450})(DCK; EC 2.7.1.74)

JSON output

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands downloadOMIM and omim.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using subcommands downloadOMIM and omim

The first step in builing the OMIM .nga files is to use the SAUtils command's subcommand downloadOMIM to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable OmimApiKey.

export OmimApiKey=<users-omim-api-key>
SAUtils.dll downloadOMIM
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll downloadomim [options]
Download the OMIM gene annotation data

OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--in, -i <path> input configuration JSON path (optional)
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14

---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

Gene Symbol Update Statistics
============================================
{
"NumGeneSymbolsUpToDate": 16978,
"NumGeneSymbolsUpdated": 60,
"NumGenesWhereBothIdsAreNull": 0,
"NumGeneSymbolsNotInCache": 105,
"NumUnresolvedGeneSymbolConflicts": 0
}

Once the download has succeeded, the nga files can be produced using the SAUtils command's subcommand omim.

dotnet SAUtils.dll omim
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll omim [options]
Creates a gene annotation database from OMIM data

OPTIONS:
--m2g, -m <VALUE> MimToGeneSymbol tsv file
--json, -j <VALUE> OMIM entry json file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version


dotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:04.5
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/phylop-json/index.html b/3.24/data-sources/phylop-json/index.html new file mode 100644 index 00000000..c56c7bd4 --- /dev/null +++ b/3.24/data-sources/phylop-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +phylop-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/phylop/index.html b/3.24/data-sources/phylop/index.html new file mode 100644 index 00000000..e1da0c72 --- /dev/null +++ b/3.24/data-sources/phylop/index.html @@ -0,0 +1,21 @@ + + + + + + + +PhyloP | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

PhyloP

Overview

Publication

Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. Nature 2023. (https://doi.org/10.1038/s41586-023-06798-8)

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

PhyloP Primate

PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates. +It enriches that with human genetic variants, these elements influence gene expression and impact complex traits and diseases.

PhyloP Primate is only available for GRCh38 assembly.

BigWig File

The original file is primates_msa.phylop.conacc.lrt.bw which is a bigwig file. This file was converted to wig file using: +(https://genome.ucsc.edu/goldenPath/help/bigWig.html) +After conversion the wig file provides the scores in the following format:

0.14
0.074
-2.487
0.073
0.052
0.073
fixedStep chrom=chr1 start=10558 step=1 span=1
-1.991
0.052
-2.047
0.052
0.052
0.074
-1.992
0.074
0.052
0.073
0.074
0.052
0.074
-2.05
-2.059
0.074
0.074
0.074

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951

PhyloP

PhyloP (phylogenetic p-values) conservation scores are obtained from the [PHAST package] (http://compgen.bscb.cornell.edu/phast/) for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes.

WigFix File

The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:

fixedStep chrom=chr1 start=10918 step=1
0.064
0.058
0.064
0.058
0.064
0.064
fixedStep chrom=chr1 start=34045 step=1
0.111
0.100
0.111
0.111
0.100
0.111
0.111
0.111
0.100
0.111
-1.636

We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs.

Download URL

GRCh37: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/

GRCh38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/phylopprimate-json/index.html b/3.24/data-sources/phylopprimate-json/index.html new file mode 100644 index 00000000..1008ecbf --- /dev/null +++ b/3.24/data-sources/phylopprimate-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +phylopprimate-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

phylopprimate-json

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/primate-ai-json/index.html b/3.24/data-sources/primate-ai-json/index.html new file mode 100644 index 00000000..f644d21e --- /dev/null +++ b/3.24/data-sources/primate-ai-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +primate-ai-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

primate-ai-json

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/primate-ai/index.html b/3.24/data-sources/primate-ai/index.html new file mode 100644 index 00000000..74d604d5 --- /dev/null +++ b/3.24/data-sources/primate-ai/index.html @@ -0,0 +1,24 @@ + + + + + + + +Primate AI-3D | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Primate AI-3D

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. +The model's innovative use of primate sequencing and structural data offers promising insights into variant interpretation and disease gene identification. +The predictive score range between 0 and 1, with 0 being benign and 1 being most pathogenic.

For more details, refer to these publications:

Publication
  1. Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023). https://doi.org/10.1126/science.abn8197
  2. Sundaram, L., Gao, H., Padigepati, S.R. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170 (2018). https://doi.org/10.1038/s41588-018-0167-z
Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parsing

TSV File

chr pos non_flipped_ref non_flipped_alt gene_name   change_position_1based  ref_aa  alt_aa  score_PAI3D percentile_PAI3D    refseq  prediction
chr1 69094 G A ENST00000335137.4 2 V M 0.6169436463713646 0.5200308441794135 NM_001005484.1 pathogenic
chr1 69094 G C ENST00000335137.4 2 V L 0.5557043975591658 0.4271457250214688 NM_001005484.1 benign
chr1 69094 G T ENST00000335137.4 2 V L 0.5557043975591658 0.4271457391722522 NM_001005484.1 benign
chr1 69095 T A ENST00000335137.4 2 V E 0.8063537482917307 0.8032228720356267 NM_001005484.1 pathogenic
chr1 69095 T C ENST00000335137.4 2 V A 0.5795628190040587 0.4631329075815453 NM_001005484.1 benign
chr1 69095 T G ENST00000335137.4 2 V G 0.7922330142557621 0.7834049546930125 NM_001005484.1 pathogenic

From the CSV file, all columns are parsed:

  • chr
  • pos
  • non_flipped_ref
  • non_flipped_alt
  • gene_name
  • change_position_1based
  • ref_aa
  • alt_aa
  • score_PAI3D
  • percentile_PAI3D
  • refseq
  • prediction

The fields gene_name and refseq define the Ensembl and RefSeq transcript IDs respectively. +These transcripts are passed as-is and some of them might be unrecognized/deprecated by RefSeq/Ensembl.

GRCh37

Note that for GRCh37, a lifted over file is provided. +The file is not sorted, therefore it must first be sorted. +Also note that certain RefSeq transcripts appear not to have been mapped during the lift-over process.

Pre-processing

Sorting

gzcat PrimateAI-3D.hg19.txt.gz | sort -t $'\t'  -k1,1 -k2,2n | gzip > PrimateAI-3D.hg19_sorted.tsv.gz

SA Generation

dotnet SAUtils.dll \
PrimateAi \
--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \
--i "${ExternalDataSources}/PrimateAI/3D/PrimateAI-3D.hg38.txt.gz" \
--o "${SaUtilsOutput]"

Known Issues

Known Issues

Some transcript IDs defined in the data file are obsolete, retired, or updated. +They are not removed or modified by Illumina Connected Annotations, and are passed as-is from the PrimateAI-3D data source.

Example:

ENST00000643905.1 transcript is retired according to Ensembl

NM_182838.2 transcript is removed because it is a pseudo-gene according to RefSeq

Download URL

https://primad.basespace.illumina.com/

JSON Output

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/revel-json/index.html b/3.24/data-sources/revel-json/index.html new file mode 100644 index 00000000..b15372ba --- /dev/null +++ b/3.24/data-sources/revel-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +revel-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/revel/index.html b/3.24/data-sources/revel/index.html new file mode 100644 index 00000000..1e3b77ce --- /dev/null +++ b/3.24/data-sources/revel/index.html @@ -0,0 +1,18 @@ + + + + + + + +REVEL | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/splice-ai-json/index.html b/3.24/data-sources/splice-ai-json/index.html new file mode 100644 index 00000000..b7de3a9b --- /dev/null +++ b/3.24/data-sources/splice-ai-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +splice-ai-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/splice-ai/index.html b/3.24/data-sources/splice-ai/index.html new file mode 100644 index 00000000..9355b191 --- /dev/null +++ b/3.24/data-sources/splice-ai/index.html @@ -0,0 +1,18 @@ + + + + + + + +Splice AI | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/topmed-json/index.html b/3.24/data-sources/topmed-json/index.html new file mode 100644 index 00000000..dadec671 --- /dev/null +++ b/3.24/data-sources/topmed-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +topmed-json | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + + + \ No newline at end of file diff --git a/3.24/data-sources/topmed/index.html b/3.24/data-sources/topmed/index.html new file mode 100644 index 00000000..3d7bddf5 --- /dev/null +++ b/3.24/data-sources/topmed/index.html @@ -0,0 +1,18 @@ + + + + + + + +TOPMed | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + + + \ No newline at end of file diff --git a/3.24/file-formats/custom-annotations/index.html b/3.24/file-formats/custom-annotations/index.html new file mode 100644 index 00000000..5652688c --- /dev/null +++ b/3.24/file-formats/custom-annotations/index.html @@ -0,0 +1,40 @@ + + + + + + + +Custom Annotations | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another +common use case involves protected health information (PHI). Custom annotations are a mechanism that enables both use cases.

Here are some examples of how our collaborators use custom annotations:

  • associating context from both a sample-level and a sample cohort level with the variant annotations
  • adding content that is licensed (e.g. HGMD) to the variant annotations

At the moment, we have two different custom annotation file formats. One provides additional annotations to variants (both small variants and SVs) +while the other caters to gene annotations.

In both cases, the custom annotation file format is a tab-delimited file that is separated into two parts: the header & the data.

The header is where you can customize how you want the data to appear in the JSON file and provide context about the genome assembly and how +Illumina Connected Annotations should match the variants.

At Illumina, there are usually many components downstream of Illumina Connected Annotations that have to parse our annotations. If a customer provides a custom +annotation, those downstream tools need to understand more about the data such as:

  • data type (e.g. number, boolean, or a string)
  • data category (e.g. is this an allele count, allele number, allele frequency, etc.)
  • associated population (i.e. if this is an allele frequency)

For each custom annotation, Illumina Connected Annotations uses this context to create a JSON schema that can be sent to downstream tools. If +a tool knows that this is an allele frequency, it can validate user input to ensure that it's in the range of [0, 1].

Variant File Format

File Format

Illumina Connected Annotations expects plain text (or gzipped text) files. Using tools like Excel can add extra characters that can break parsing. We highly recommend creating and modifying these files with plain text editor like Notepad, Notepad++ or Atom.

Basic Allele Frequency Example

Create the Custom Annotation TSV

Imagine that you want to create a basic allele frequency custom annotation for small variants. If we visualized the tab-delimited file +(TSV), it would look something like this:

Col 1Col 2Col 3Col 4Col 5
#title=MyDataSource
#assembly=GRCh38
#matchVariantsBy=allele
#CHROMPOSREFALTallAf
#categories...AlleleFrequency
#descriptions...ALL
#type...number
chr1623603511TGAT0.000006579
chr1668801894GA0.000006569
chr1911107436GA0.00003291

Here's the full TSV file.

Let's go over the header and discuss the contents:

  • title indicates the name of the JSON key
  • assembly indicates that this data is only valid for GRCh38.
  • matchVariantsBy indicates how annotations should be matched and reported. In this case annotations will be matched and reported by allele.
  • categories provides hints to downstream tools on how they might want to treat the data. In this case, we indicate that it's an allele frequency.
  • descriptions are used in special circumstances to provide more context. Even though column 5 is called allAf, it might not be clear to a +downstream tool that this means a global allele frequency using all sub-populations. In this case, ALL indicates the intended population.
  • type indicates to downstream tools the data type. Since allele frequencies are numbers, we'll write number in this column.
Reference Base Checking

Illumina Connected Annotations validates all the reference bases in a custom annotation. If a variant or genomic region is specified that has the wrong reference base, an error will be produced.

Sorting

The variants within each chromosome must be sorted by genomic position.

Convert to Illumina Connected Annotations Format

First we need to convert the TSV file to Illumina Connected Annotations's native file format and let's put that file in a new directory called CA:

$ mkdir CA
$ dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \
-r Data/References/Homo_sapiens.GRCh38.Nirvana.dat -i MyDataSource.tsv -o CA
---------------------------------------------------------------------------
SAUtils (c) 2020 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.12.0
---------------------------------------------------------------------------

Chromosome 16 completed in 00:00:00.1
Chromosome 19 completed in 00:00:00.0

Time: 00:00:00.2

Annotate with Illumina Connected Annotations

Let's annotate the following VCF (notice that it's one of the variants that we have in our custom annotation):

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
16 68801894 . G A . . .

Here's the full VCF file.

Since Illumina Connected Annotations can handle multiple directories with external annotations, all we need to do is specify our new CA directory in addition to +the normal Illumina Connected Annotations command-line.

$ dotnet Annotator.dll -c Data/Cache/GRCh38/Both \
-r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
--sd Data/SupplementaryAnnotation/GRCh38 --sd CA -i TestCA.vcf -o TestCA
---------------------------------------------------------------------------
IlluminaConnectedAnnotations (c) 2020 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.12.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:01.8
SA Position Scan 00:00:00.0 19

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr16 00:00:00.2 00:00:01.3 1

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:01.9 25.5 %
Preload 00:00:00.2 3.3 %
Annotation 00:00:01.3 18.2 %

Time: 00:00:06.3

Investigate the Results

We would expect the following data to show up in our JSON output file:

      "variants": [
{
"vid": "16-68801894-G-A",
"chromosome": "16",
"begin": 68801894,
"end": 68801894,
"refAllele": "G",
"altAllele": "A",
"variantType": "SNV",
"hgvsg": "NC_000016.10:g.68801894G>A",
"phylopScore": 1,
"MyDataSource": {
"refAllele": "G",
"altAllele": "A",
"allAf": 7e-06
},
"clinvar": [

Here's the full JSON file.

Illumina Connected Annotations preserves up to 6 decimal places for allele frequency data.

Categories & Descriptions Example

Create the Custom Annotation TSV

Building on the previous example, we can add other types of annotations like predictions and general notes.

Col 1Col 2Col 3Col 4Col 5Col 6Col 7
#title=MyDataSource
#assembly=GRCh38
#matchVariantsBy=allele
#CHROMPOSREFALTallAfpathogenicitynotes
#categories...AlleleFrequencyPrediction.
#descriptions...ALL..
#type...numberstringstring
chr1623603511TGAT0.000006579P.
chr1668801894GA0.000006569LPSeen in case 123
chr1911107436GA0.00003291..

Here's the full TSV file.

Placeholders

You can use a period to denote an empty value (much in the same way as periods are used in VCF files to signify missing values). While +Illumina Connected Annotations also accepts empty columns in the TSV file, we use them in these examples to promote readability.

Let's go over what's new in this example:

  • Column 6 adds a field called pathogenicity which uses the Prediction category. When using this category, Illumina Connected Annotations will +validate to make +sure that the field contains either the abbreviations (B, LB, VUS, LP, and P) or the long-form equivalents (e.g. benign or pathogenic).
  • Column 7 adds a field called notes and it doesn't have a category or description. We're just going to use it to add some internal +notes.

Annotate with Illumina Connected Annotations

Let's use a new VCF file. It includes all the same positions as our custom annotation file, but only the middle variant also matches the +alternate allele (allele-specific match):

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
16 23603511 . TG T . . .
16 68801894 . G A . . .
19 11107436 . G C . . .

Here's the full VCF file.

Investigate the Results

Because we specified #matchVariantsBy=allele in our custom annotation file, only the middle variant will get an annotation:

      "variants": [
{
"vid": "16-68801894-G-A",
"chromosome": "16",
"begin": 68801894,
"end": 68801894,
"refAllele": "G",
"altAllele": "A",
"variantType": "SNV",
"hgvsg": "NC_000016.10:g.68801894G>A",
"phylopScore": 1,
"MyDataSource": {
"refAllele": "G",
"altAllele": "A",
"allAf": 7e-06,
"pathogenicity": "LP",
"notes": "Seen in case 123"
},
"clinvar": [

Here's the full JSON file.

Using Positional Matches

What would happen if we changed to #matchVariantsBy=position? Two things will happen. First, our positional variants will now match:

      "variants": [
{
"vid": "16-23603511-TG-T",
"chromosome": "16",
"begin": 23603512,
"end": 23603512,
"refAllele": "G",
"altAllele": "-",
"variantType": "deletion",
"hgvsg": "NC_000016.10:g.23603512delG",
"MyDataSource": [
{
"refAllele": "GA",
"altAllele": "-",
"allAf": 7e-06,
"pathogenicity": "P"
}
],
"clinvar": [

In addition, you will now see an extra flag for our allele-specific variant:

      "variants": [
{
"vid": "16-68801894-G-A",
"chromosome": "16",
"begin": 68801894,
"end": 68801894,
"refAllele": "G",
"altAllele": "A",
"variantType": "SNV",
"hgvsg": "NC_000016.10:g.68801894G>A",
"phylopScore": 1,
"MyDataSource": [
{
"refAllele": "G",
"altAllele": "A",
"allAf": 7e-06,
"pathogenicity": "LP",
"notes": "Seen in case 123",
"isAlleleSpecific": true
}
],
"clinvar": [

Genomic Region Example

Create the Custom Annotation TSV

In the previous example, we added a note for the middle variant, but sometimes it's handy to annotate a genomic region. Consider the following example:

Col 1Col 2Col 3Col 4Col 5
#title=MyDataSource
#assembly=GRCh38
#matchVariantsBy=allele
#CHROMPOSREFENDnotes
#categories....
#descriptions....
#type...string
chr1620000000T70000000Lots of false positives in this region

Here's the full TSV file.

Let's go over what's new in this example:

  • Column 5 now has a field called notes. In essence, it looks exactly like column 7 from our previous example.
  • The main difference is that now one of our custom annotation entries is actually a genomic region. Any variant that overlaps with that region will get a custom annotation.

In the previous example we learned about positional matching vs allele-specific matching. For genomic regions, #matchVariantsBy=allele and #matchVariantsBy=position produce +the same result.

Annotate with Illumina Connected Annotations

Let's use the same VCF file as our previous example.

Investigate the Results

    {
"chromosome": "16",
"position": 23603511,
"refAllele": "TG",
"altAlleles": [
"T"
],
"cytogeneticBand": "16p12.2",
"MyDataSource": [
{
"start": 20000000,
"end": 70000000,
"notes": "Lots of false positives in this region",
"reciprocalOverlap": 0,
"annotationOverlap": 0
}
],
"variants": [

Here's the full JSON file.

Reciprocal & Annotation Overlap

For all intervals, Illumina Connected Annotations internally calculates two overlaps: a variant overlap and an annotation overlap. Variant overlap is the percentage of the variant's length that is +overlapped. Annotation overlap is the percentage of the annotation's length that is overlap.

Reciprocal overlap is the minimum of those two overlaps. Given that the annotation is 50 Mbp and the deletion is one 1 bp, both overlaps will be pretty close to 0.

We will also see this annotation for the other variant on chr16:

    {
"chromosome": "16",
"position": 68801894,
"refAllele": "G",
"altAlleles": [
"A"
],
"cytogeneticBand": "16q22.1",
"MyDataSource": [
{
"start": 20000000,
"end": 70000000,
"notes": "Lots of false positives in this region",
"reciprocalOverlap": 0,
"annotationOverlap": 0
}
],
"variants": [

Genomic Regions for Structural Variants Example

Create the Custom Annotation TSV

Often we use genomic regions to represent other known CNVs and SVs in the genome. In this use case, we usually don't want to match these regions to other small variants. To force Illumina Connected Annotations to match regions only to other SVs, use the #matchVariantsBy=sv option in the header. Here is an example:

Col 1Col 2Col 3Col 4Col 5
#title=MyDataSource
#assembly=GRCh38
#matchVariantsBy=sv
#CHROMPOSREFENDnotes
#categories....
#descriptions....
#type...string
chr1620000000T70000000Lots of false positives in this region

Here's the full TSV file.

Let's go over what's new in this example:

  • The main difference is the header field #matchVariantsBy=sv which indicates that only structural variants that overlap these genomic regions will receive annotations.

Annotate with Illumina Connected Annotations

Let's use a new VCF file. It contains the first variant from the previous file and a structural variant deletion- both of which overlap the given genomic region.

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
16 23603511 . TG T . . .
16 68801894 . G <DEL> . . END=73683789;SVTYPE=DEL

Here's the full VCF file.

Investigate the Results

Note that this time, MyDataSource only showed up for the <DEL> and not the deletion 16-23603511-TG-T.

    {
"chromosome": "16",
"position": 23603511,
"refAllele": "TG",
"altAlleles": [
"T"
],
"cytogeneticBand": "16p12.2",
"variants": [
...
...
{
"chromosome": "16",
"position": 68801894,
"svEnd": 73683789,
"refAllele": "G",
"altAlleles": [
"<DEL>"
],
"cytogeneticBand": "16q22.1-q22.3",
"MyDataSource": [
{
"start": 20000000,
"end": 70000000,
"notes": "Lots of false positives in this region",
"reciprocalOverlap": 0.02396,
"annotationOverlap": 0.02396
}
],
"variants": [

Mixing Small Variants and Genomic Regions

Create the Custom Annotation TSV

Previously we looked at examples that either had small variants or genomic regions. Let's create a file that contains both:

Col 1Col 2Col 3Col 4Col 5Col 6
#title=MyDataSource
#assembly=GRCh38
#matchVariantsBy=allele
#CHROMPOSREFALTENDnotes
#categories.....
#descriptions.....
#type....string
chr1623603511TGAT..
chr1668801894GA..
chr1911107436GA..
chr2110510818C.10699435Interval #1
chr2110510818C<DEL>10699435Interval #2
chr2212370388TT[chr22:12370729[.Known false-positive

Here's the full TSV file.

Let's go over what's new in this example:

  • Column 4 now has the REF field. Exception for the case listed below, this is only used by small variants or translocation breakends.
  • Column 5 now has the END field. This is only used by genomic regions.
  • There are two custom annotations on chr21 and the start and end coordinates look the same, so what's different? Interval #2 has a symbolic allele in the ALT column. When this is used in custom annotation, the start position is treated as the padding base (using VCF conventions). When Illumina Connected Annotations matches a variant to interval #2, it will ignore the padding base and consider the start position to be at position 10510819.

Annotate with Illumina Connected Annotations

Let's use a new VCF file to study how matching works for intervals #1 and #2:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
21 10510818 . C <DUP> . . END=10699435;SVTYPE=DUP
22 12370388 . T T[chr22:12370729[ . . SVTYPE=BND

Here's the full VCF file.

The first variant is similar to the custom annotation labelled "interval #2". Position 10510818 is the padding base, so it effectively starts at position 10510819.

Investigate the Results

  "positions": [
{
"chromosome": "21",
"position": 10510818,
"svEnd": 10699435,
"refAllele": "C",
"altAlleles": [
"<DUP>"
],
"cytogeneticBand": "21p11.2",
"MyDataSource": [
{
"start": 10510818,
"end": 10699435,
"notes": "Interval #1",
"reciprocalOverlap": 0.99999,
"annotationOverlap": 0.99999
},
{
"start": 10510819,
"end": 10699435,
"notes": "Interval #2",
"reciprocalOverlap": 1,
"annotationOverlap": 1
}
],

Here's the full JSON file.

As expected, the variant and interval #2 have matching endpoints, therefore there is 100% overlap. Interval #1 technically starts 1 bp earlier, so its overlap 99.9%.

Further down the JSON file, we find the annotated translocation breakend:

      "variants": [
{
"vid": "22-12370388-T-T[chr22:12370729[",
"chromosome": "22",
"begin": 12370388,
"end": 12370388,
"isStructuralVariant": true,
"refAllele": "T",
"altAllele": "T[chr22:12370729[",
"variantType": "translocation_breakend",
"MyDataSource": {
"refAllele": "T",
"altAllele": "T[chr22:12370729[",
"notes": "Known false-positive"
}
}

Gene File Format

Basic Gene Example

Create the Custom Annotation TSV

Previously we looked at examples that either had small variants or genomic regions, however, sometimes we would like to add custom gene annotations. The gene custom annotation file format +looks slightly different:

Col 1Col 2Col 3Col 4
#title=MyDataSource
#geneSymbolgeneIdphenotypenotes
#categories...
#descriptions...
#type.stringstring
TP537157Colorectal cancer, hereditary nonpolyposis, type 5.
KRASENSG00000133703Mismatch repair cancer syndromeSeen in cohort 123

Here's the full TSV file.

Let's go over what's in this example:

  • Column 2 has the geneId field. This can be either an Entrez Gene ID or an Ensembl ID.
Gene Symbols

Gene symbols are always in flux and are being updated on a daily basis at the NCBI and at HGNC. Due to this, Illumina Connected Annotations uses the geneId to match genes rather than the gene symbol. However, to +make the custom annotation files easier to read, we've included the geneSymbol column as well.

Unknown Gene IDs

When Illumina Connected Annotations parses the gene custom annotation file, it will note any gene IDs that are currently not recognized in Illumina Connected Annotations. In such a case, Illumina Connected Annotations will display an error showing all the +unrecognized gene IDs.

Annotate with Illumina Connected Annotations

Let's use a VCF file that contain variants in TP53 and KRAS:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
12 25227255 . A T . . .
17 7675074 . C A . . .

Here's the full VCF file.

Investigate the Results

  "genes": [
{
"name": "KRAS",
"clingenGeneValidity": [
{
"diseaseId": "MONDO_0009026",
"disease": "Costello syndrome",
"classification": "disputed",
"classificationDate": "2018-07-24"
}
],
"clingenDosageSensitivityMap": {
"haploinsufficiency": "no evidence to suggest that dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "no evidence to suggest that dosage sensitivity is associated with clinical phenotype"
},
"gnomAD": {
"pLi": 0.000788,
"pRec": 0.789,
"pNull": 0.21,
"synZ": 0.336,
"misZ": 2.32,
"loeuf": 1.24
},
"MyDataSource": {
"phenotype": "Mismatch repair cancer syndrome",
"notes": "Seen in cohort 123"
}
},

This is the abbreviated output for KRAS. Here's the full JSON file if you want to see the complete KRAS entry.

Customizing the Header

Title

For the title, you can provide any string that hasn't already been used. The title should be unique.

caution

Make sure that the title does not conflict with other keys in the JSON file.

For small variants, you can't provide a title that conflicts with other keys in the variant object. Some examples of this would be +vid, chromosome, transcripts, etc.. The title should also not conflict with other data source keys like clinvar or gnomad.

For structural variants, you can't provide a title that conflicts with other keys in the position object. Some examples of this would be +chromosome, svLength, cytogeneticBand, etc. The title should also not conflict with other data source keys like clingen or dgv.

caution

Care should be taken not to annotate using multiple custom annotations that all use the same title.

Genome Assemblies

The following genome assemblies can be specified:

  • GRCh37
  • GRCh38

Matching Criteria

The matching criteria instructs how Illumina Connected Annotations should match a VCF variant to the custom annotation.

The following matching criteria can be specified:

  • allele - use this when you only want allele-specific matches. This is commonly the case when using allele frequency data sources like gnomAD
  • position - use this when you want positional matches. This is commonly used with disease phenotype data sources like ClinVar
  • sv - use this when you want to match to all other overlapping SVs. This use case arose when we were adding custom annotations for baseline +copy number intervals along the genome.

Categories

Categories are not used by Illumina Connected Annotations, but are often used by downstream tools. Categories provide hints for how those tools should filter or display +the annotation data.

When a category is specified, Illumina Connected Annotations will provide additional validation for those fields. The following table describes each category:

CategoryDescriptionValidation
AlleleCountallele counts for a specific populationSee the supported populations below
AlleleNumberallele numbers for a specific populationSee the supported populations below
AlleleFrequencyallele frequencies for a specific populationSee the supported populations below
PredictionACMG-style pathogenicity classificationsbenign (B)
likely benign (LB)
VUS
likely pathogenic (LP)
pathogenic (P)
Filterfree text that signals downstream tools to add the column to the filterMax 20 characters
Descriptionfree-text descriptionMax 100 characters
Identifierany IDMax 50 characters
HomozygousCountcount of homozygous individuals for a specific populationSee the supported populations below
Scoreany score valueAny double-precision floating point number

Descriptions

Descriptions are used to add more context to the categories. For now, descriptions are mainly used to associate allele counts, numbers, and frequencies with their respective populations.

Populations

The following populations were specified in the HapMap project, 1000 Genomes Project, ExAC, and gnomAD.

Population CodeSuper-population CodeDescription
ACBAFRAfrican Caribbeans in Barbados
AFRAFRAfrican
ALLALLAll populations
AMRAMRAd Mixed American
ASJAshkenazi Jewish
ASWAFRAmericans of African Ancestry in SW USA
BEBSASBengali from Bangladesh
CDXEASChinese Dai in Xishuangbanna, China
CEUEURUtah Residents (CEPH) with Northern and Western European Ancestry
CHBEASHan Chinese in Beijing, China
CHSEASSouthern Han Chinese
CLMAMRColombians from Medellin, Colombia
EASEASEast Asian
ESNAFREsan in Nigeria
EUREUREuropean
FINEURFinnish in Finland
GBREURBritish in England and Scotland
GIHSASGujarati Indian from Houston, Texas
GWDAFRGambian in Western Divisions in the Gambia
IBSEURIberian population in Spain
ITUSASIndian Telugu from the UK
JPTEASJapanese in Tokyo, Japan
KHVEASKinh in Ho Chi Minh City, Vietnam
LWKAFRLuhya in Webuye, Kenya
MAGAFRMandinka in the Gambia
MKKAFRMaasai in Kinyawa, Kenya
MSLAFRMende in Sierra Leone
MXLAMRMexican Ancestry from Los Angeles, USA
NFEEUREuropean (Non-Finnish)
OTHOTHOther
PELAMRPeruvians from Lima, Peru
PJLSASPunjabi from Lahore, Pakistan
PURAMRPuerto Ricans from Puerto Rico
SASSASSouth Asian
STUSASSri Lankan Tamil from the UK
TSIEURToscani in Italia
YRIAFRYoruba in Ibadan, Nigeria

Data Types

Each custom annotation can be one of the following data types:

  • bool - true or false
  • number - any integer or floating-point number
  • string - text
tip

For boolean variables, only keys with a true value will be output to the JSON object.

Using SAUtils

Illumina Connected Annotations includes a tool called SAUtils that converts various data sources into Illumina Connected Annotations's native binary format. The sub-commands customvar and customgene are used to specify a variant file or a gene file respectively.

Convert Variant File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory

Convert Gene File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customgene \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-c Data/Cache \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -c argument specifies the Illumina Connected Annotations cache path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory
+ + + + \ No newline at end of file diff --git a/3.24/file-formats/illumina-annotator-json-file-format/index.html b/3.24/file-formats/illumina-annotator-json-file-format/index.html new file mode 100644 index 00000000..bc9b2a07 --- /dev/null +++ b/3.24/file-formats/illumina-annotator-json-file-format/index.html @@ -0,0 +1,18 @@ + + + + + + + +Illumina Connected Annotations JSON File Format | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"id": "4"
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
idstringallprovided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2-48010488-G-A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"NMD_transcript_variant",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268/4158",
"cdsPos":"116/483",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39/160",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"impact": "moderate",
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"proteinId":"ENSP00000405294.1",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstringFormat: start-end/Length
cdsPosstringFormat: start-end/Length
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstringFormat: start-end/Length
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
impactstringSee Consequence Impact for details
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
proteinIdstringprotein ID. E.g. ENSP00000405294.1
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + + + \ No newline at end of file diff --git a/3.24/file-formats/illumina-annotator-vcf-file-format/index.html b/3.24/file-formats/illumina-annotator-vcf-file-format/index.html new file mode 100644 index 00000000..7440b04e --- /dev/null +++ b/3.24/file-formats/illumina-annotator-vcf-file-format/index.html @@ -0,0 +1,18 @@ + + + + + + + +Illumina Connected Annotations VCF File Format | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Illumina Connected Annotations VCF File Format

Overview

While JSON output format is the default option, we support VCF file as our output too. The VCF output mode can be enabled by --output-mode vcf as shown below:

dotnet Annotator.dll \
-c Data/Cache \
--output-format vcf \
-r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000.out
# HiSeq.10000.out.vcf.gz file should be produced after processing.

VCF Output Format

The output VCF file should have headers similar as below, which indicates the IlluminaConnectedAnnotations's version, file creation time, assembly, and data sources used for producing the output:

##fileformat=VCFv4.2
##IlluminaConnectedAnnotations="3.24.0" time="2024-03-22 07:02:13" assembly="GRCh38" Ensembl="110" RefSeq="GCF_000001405.40-RS_2023_03"
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230110
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
...
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Novaseq_TSPF450-NA12878-1-HFHWJDMXX_S1_L001 Novaseq_TSPF450-NA12891-1-HFHWJDMXX_S3_L001

VCF Lines

Core annotation for overlapping transcripts is enabled and no supplementary annotation is added in VCF mode. A CSQ field is added under INFO column with following format:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">

Multiple transcripts are separated with ,. An example of produced VCF lines as below:

chr21 5316038  MantaDEL:1:11095:74644:0:4:0  G  <DEL> 999   MaxDepth END=7246574;SVTYPE=DEL;SVLEN=-1930536;SVINSLEN=4;SVINSSEQ=TTCT;CSQ=<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624261.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624859.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000623227.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000619252.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623449.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623436.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624627.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624368.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623914.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624516.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624412.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000622939.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623050.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624444.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623887.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000611026.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000610788.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279784|Transcript|ENST00000623587.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279064|Transcript|ENST00000623723.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000288187|Transcript|ENST00000671789.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000616522.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000621924.4|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000619488.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000617746.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000624446.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623405.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623575.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623506.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280019|Transcript|ENST00000624484.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279709|Transcript|ENST00000623377.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688828.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688458.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692898.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689306.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692318.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624576.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623738.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701070.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623989.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701260.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692046.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692237.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689354.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624165.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624847.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000615262.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623047.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623950.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000623225.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623324.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000624181.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279788|Transcript|ENST00000624266.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279728|Transcript|ENST00000623809.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280164|Transcript|ENST00000623892.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279998|Transcript|ENST00000623678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|hsa-mir-8069-1|Transcript|ENST00000616627.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279751|Transcript|ENST00000623720.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623165.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624519.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623347.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624728.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279477|Transcript|ENST00000623518.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278884|Transcript|ENST00000625184.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623095.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000622911.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000621909.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623394.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000624310.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000615804.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000617336.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CTBP2P10|Transcript|ENST00000624153.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|NR_170984.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724354|Transcript|NR_136540.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CH507-42P11.6|Transcript|NR_171776.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724428|Transcript|NM_001320643.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354012.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354009.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NR_148682.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354010.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354015.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354014.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001321073.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354008.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354007.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354006.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320646.2|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320650.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320648.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320651.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001314050.5|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC106780825|Transcript|NR_133678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001320719.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146656.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146655.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146657.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|MIR8069-1|Transcript|NR_107036.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724843|Transcript|NR_170986.1|True||||21-5316038-7246574-G-<DEL>-DEL   GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:999:999,0,999:58,5:69,63:.:.  0/1:PASS:999:999,0,999:59,7:67,71:.:.  0/1:PASS:999:999,0,999:118,4:140,79:.:.
chr21 6639699 MantaDEL:514264:0:0:0:7:0 AGAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG AAA 537 MaxMQ0Frac END=6639804;SVTYPE=DEL;SVLEN=-105;CIGAR=1M2I105D;CSQ=AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623047.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623106.3:n.223-5036_223-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True|NC_000021.9:g.6639700_6639804delinsAA|ENST00000625185.3:n.232-5036_232-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624846.3:n.130-5036_130-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623313.1:n.312-7367_312-7263delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623950.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624965.1:n.151-5036_151-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:8:205,0,4:1,0:11,5:.:. 0/1:PASS:86:431,0,83:0,0:16,13:.:. 0/0:HomRef:61:0,11,66:2,0:7,0:.:.
chr21 8811598 MantaBND:514412:0:1:0:0:0:0 G G[chr21:8854301[ 999 NoPairSupport SVTYPE=BND;MATEID=MantaBND:514412:0:1:0:0:0:1;CIPOS=0,4;HOMLEN=4;HOMSEQ=TGCA;BND_DEPTH=300;MATE_BND_DEPTH=213;CSQ=G[chr21:8854301[|transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True||||21-8811598-G-G[chr21:8854301[ GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:253:303,0,999:9,0:89,12:.:. 0/1:PASS:999:999,0,999:0,0:99,39:.:. 0/0:HomRef:410:0,360,999:17,0:141,0:.:.
chr21 8813774 MantaINS:514450:0:0:0:1:0 T TATATATACATATATATATATACATATATATATATGTATATATATATATATAC 487 MaxMQ0Frac END=8813774;SVTYPE=INS;SVLEN=52;CIGAR=1M52I;CIPOS=0,7;HOMLEN=7;HOMSEQ=ATATATA;CSQ=ATATATACATATATATATATACATATATATATATGTATATATATATATATAC|intron_variant&non_coding_transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True|NC_000021.9:g.8813781_8813782insCATATATATATATACATATATATATATGTATATATATATATATACATATATA|ENST00000651312.1:n.40-6603_40-6602insGTATATATATATATATACATATATATATATGTATATATATATATGTATATAT||21-8813774-T-TATATATACATATATATATATACATATATATATATGTATATATATATATATAC GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:29:128,0,26:0,0:8,4:.:. 1/1:PASS:6:335,8,0:0,0:6,8:.:. 0/1:PASS:21:176,0,18:0,0:3,6:.:.
+ + + + \ No newline at end of file diff --git a/3.24/frequently-asked-questions/Annotator-vs-data-update/index.html b/3.24/frequently-asked-questions/Annotator-vs-data-update/index.html new file mode 100644 index 00000000..6d65d342 --- /dev/null +++ b/3.24/frequently-asked-questions/Annotator-vs-data-update/index.html @@ -0,0 +1,18 @@ + + + + + + + +Annotation Engine vs Data update | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Annotation Engine vs Data update

Background

Update to annotations can be broadly categorized into two categories:

  • Annotation engine (Annotator) update.
  • Annotation data update.

Understanding the nature of these two types of updates is key when it comes to updating annotation.

Annotator update

The annotator is the engine that contains logic for core annotations such as computing variant consequences, HGVS notations, mapped positions (e.g. CDNA, CDS, protein positions), detecting gene fusions, etc., and perform annotation lookups from external data sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. also known as supplementary annotations (SA). Update to the annotator entails new features or bugfixes to the compute or lookup mechanism. This is completely independent of the data update such as updating dbSNP from v154 to v155. In other words, the same annotator can annotate with dbSNP v154 and dbSNP v155 when provided with the appropriate data files.

Data update

The annotator uses data from various sources (listed in Introduction). For example, gene models used for core annotations are obtained from RefSeq and Ensembl. Supplementary annotations come from various sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. Any of these data can be updated without updating the annotator as long as the file formats are compatible.

Update scenarios

Let us look at a few update scenarios.

RequirementWhat needs to be updated /addedSuggested action
New transcripts and gene symbolsCache files from RefSeq and EnsemblRun Downloader
Update ClinVarClinVar SA filesRun Downloader
New external annotationNew SA files requiredSubmit feature request
New annotation featureAnnotatorSubmit feature request
+ + + + \ No newline at end of file diff --git a/3.24/index.html b/3.24/index.html new file mode 100644 index 00000000..b89e1ac2 --- /dev/null +++ b/3.24/index.html @@ -0,0 +1,21 @@ + + + + + + + +Introduction | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. +The current officially supported versions are:

Data SourceVersionRelease Date
RefSeqGCF_000001405.40-RS_2023_032023-03-21
Ensembl1102023-04-27

In addition, it uses external data sources to provide additional context for each variant. +Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. +The basic tier can be accessed free of charge. The professional tier requires a license. Please see Licensed Content for details. For access, please contact annotation_support@illumina.com.

Data SourceAvailabilityLatest Supported Version
COSMICProfessional99
OMIMProfessional20240110
Primate AI-3DProfessional1.0
Splice AIProfessional1.3
1000 Genomes ProjectBasicPhase 3 v3plus
Cancer HotspotsBasic2017
ClinGenBasic20240110
ClinVarBasic20231230
DANNBasic20200205
dbSNPBasic156
DECIPHERBasic201509
FusionCatcherBasic1.33
GERPBasic20110522
GME VariomeBasic20160618
gnomADBasic3.1.2
MITOMAPBasic20200819
MultiZ 100 wayBasic20171006
REVELBasic20200205
TOPMedBasicfreeze 5

Download

Please visit Illumina Connected Annotations.

+ + + + \ No newline at end of file diff --git a/3.24/introduction/dependencies/index.html b/3.24/introduction/dependencies/index.html new file mode 100644 index 00000000..5842a910 --- /dev/null +++ b/3.24/introduction/dependencies/index.html @@ -0,0 +1,18 @@ + + + + + + + +Dependencies | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
+ + + + \ No newline at end of file diff --git a/3.24/introduction/getting-started/index.html b/3.24/introduction/getting-started/index.html new file mode 100644 index 00000000..698e6dc6 --- /dev/null +++ b/3.24/introduction/getting-started/index.html @@ -0,0 +1,18 @@ + + + + + + + +Getting Started | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Preserving old data file

By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your SupplementaryAnnotation folder contained ClinVar_20231101.nsa and the latest available version is ClinVar_20231203.nsa, next time the Downloader is run, ClinVar_20231101.nsa will be replaced with ClinVar_20231203.nsa.

Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the SupplementaryAnnotation folder. In other words, the SupplementaryAnnotation folder cannot contain both ClinVar_20231101.nsa and ClinVar_20231203.nsa.

Here is an example of how to proceed if a user doesn't want the latest version of ClinVar.

ls Data/SupplementaryAnnotation/GRCh38
...
ClinGen_disease_validity_curations_20231011.nga
ClinVar_20230930.nsa
ClinVar_20230930.nsa.idx
...
mv Data/SupplementaryAnnotation/GRCh38/ClinVar* <tmp_dir>/GRCh38/

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh38 \
-o Data

rm Data/SupplementaryAnnotation/GRCh38/ClinVar*
mv <tmp_dir>/GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet Nirvana.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--tsv <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--credentialsFile <VALUE>
File path to user credentials, default is set to ~
/.ilmnAnnotations/credentials.json
--ignoreLicenseError ignore error due to invalid license and skip
related data sources
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--disable-junction-preservation
disable junction preserving functional annotation
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--sa-cutoff <VALUE> Any SV larger than or equal to this value will
not have any supplementary annotations
--output-format <VALUE>
output file format, available options: json, vcf.
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

+ + + + \ No newline at end of file diff --git a/3.24/introduction/licensedContent/index.html b/3.24/introduction/licensedContent/index.html new file mode 100644 index 00000000..16836066 --- /dev/null +++ b/3.24/introduction/licensedContent/index.html @@ -0,0 +1,25 @@ + + + + + + + +Licensed Content | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Licensed Content

Illumina Conncted Annotations supports following content which is available through a license from Illumina. +The license file will allow users to download and annotate with these data sources.

  • COSMIC
  • OMIM
  • Primate AI-3D
  • Splice AI
tip

License may be customized to allow access to one of more of the above at the time of license creation.

note

The Annotator packaged with DRAGEN comes with a license for all premium contents. +That is, if the Annotator is run from within DRAGEN, all premium content will be available. +However, this doesn't automatically grant a license to get premium contents while running the Annotator outside of DRAGEN. +Please contact annotation_support@illumina.com for stand-alone licenses.

How to obtain the license?

Please contact annotation_support@illumina.com to obtain a special credentials file for the data sources of interest.

Visit Illumina Connected Annotations for more details.

How to use the credentials file?

After obtaining the credentials file, it may be used in two ways:

  1. Home folder
  2. Commandline argument

The default location of the license file is ~/.ilmnAnnotations/credentials.json. An example of credentials file as below:

{
"ApiKey":"myApiKey",
"ApiSecret": "abcdefghikjlmnopqrstuvwxyz-secretKey"
}

However, this can be overridden by the command line argument while downloading/annotating.

Download licensed content

dotnet Downloader.dll \
-o ~/data \
-ga GRCh38 \
--credentialsFile ~/credentials.json

Annotate with licensed content

dotnet Annotator.dll \
--ref ~/data/References/7/Homo_sapiens.GRCh38.Nirvana.dat \
--sd ~/data/SupplementaryDatabase \
-c ~/data/Cache/32 \
-i ~/input_vcf-hg38.vcf.gz \
-o ~/output \
--credentialsFile ~/credentials.json

Licensing Errors

If the license has expired, Illumina Connected Annotations will stop annotating and exit with an error code. +These errors may be skipped by using the --ignoreLicenseError command line argument. +After doing this, only basic data sources will be used for annotations. +This can also be achieved by deleting the credentials file from the home folder.

+ + + + \ No newline at end of file diff --git a/3.24/introduction/parsing-json/index.html b/3.24/introduction/parsing-json/index.html new file mode 100644 index 00000000..aa4d92ce --- /dev/null +++ b/3.24/introduction/parsing-json/index.html @@ -0,0 +1,18 @@ + + + + + + + +Parsing Illumina Connected Annotations JSON | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
+ + + + \ No newline at end of file diff --git a/3.24/utilities/jasix/index.html b/3.24/utilities/jasix/index.html new file mode 100644 index 00000000..b68b4832 --- /dev/null +++ b/3.24/utilities/jasix/index.html @@ -0,0 +1,18 @@ + + + + + + + +Jasix | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
+ + + + \ No newline at end of file diff --git a/3.24/utilities/sautils/index.html b/3.24/utilities/sautils/index.html new file mode 100644 index 00000000..6742512e --- /dev/null +++ b/3.24/utilities/sautils/index.html @@ -0,0 +1,22 @@ + + + + + + + +SAUtils | IlluminaConnectedAnnotations + + + + +
+
Version: 3.24

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema

SAUtils AutoDownloadGenerate

To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands. +This subcommands basically integrate both download and generate subcommand. Currently, this subcommand support several data sources:

  • ClinVar
  • ClinGen
  • dbSNP
  • OMIM
  • COSMIC
dotnet SAUtils.dll AutoDownloadGenerate
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll autodownloadgenerate [options]
Downloads and generates the Supplementary Database for Omim, ClinGen, ClinVar, dbSNP, and COSMIC

OPTIONS:
--sources, -s <VALUE> comma separated list of external data sources
--inputJson, -j <path> input JSON path
--downloadBaseFolder, -b <directory>
base directory path external datasources
downloaded to
--downloadDate, -d <directory>
date directory name that external datasources
downloaded to. Default is today's date in yyyy-
MM-dd format (e.g. 2023-01-30).
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output SA directory
--actions, -a <VALUE> comma separated list of action(s) to perform.
action options: download, generate.
--help, -h displays the help menu
--version, -v displays the version

You can download only, generate only, or both download and generate supplementary files. +To use this subcommands, you have to prepare a json file that will be used as data sources information. +Below is tutorial to use this subcommand to generate each data source.

AutoDownloadGenerate ClinVar

Below is the command to use AutoDownloadGenerate for ClinVar to download and generate supplementary files.

dotnet SAUtils.dll AutoDownloadGenerate -s ClinVar -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinVar should be formatted like below:

{
"clinvar": {
"baseDirectory": "ClinVar",
"sourceFiles": [
{
"name": "ClinVar",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"files": [
{
"localFileName": "ClinVarFullRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz.md5"
},
{
"localFileName": "ClinVarVariationRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz.md5"
}
]
}
]
}
}

There is no need to modify the json entry for ClinVar and you can use as it is.

AutoDownloadGenerate ClinGen

dotnet SAUtils.dll AutoDownloadGenerate -s ClinGen -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinGen should be formatted like below:

{
"clingen": {
"baseDirectory": "ClinGen",
"sourceFiles": [
{
"name": "ClinGen Dosage Sensitivity Map",
"subDirectory": "DosageSensitivity",
"description": "Dosage sensitivity map from ClinGen (dbVar)",
"files": [
{
"localFileName": "ClinGen_gene_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_gene_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh38.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh38.tsv"
}
]
},
{
"name": "ClinGen disease validity curations",
"subDirectory": "GeneDiseaseValidity",
"description": "Disease validity curations from ClinGen (dbVar)",
"files": [
{
"localFileName": "Clingen-Gene-Disease-Summary.csv",
"downloadUrl": "https://search.clinicalgenome.org/kb/gene-validity/download"
}
]
}
]
}
}

There is no need to modify the json entry for ClinGen and you can use as it is.

AutoDownloadGenerate dbSNP

dotnet SAUtils.dll AutoDownloadGenerate -s dbSNP -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for dbSNP should be formatted like below:

{
"dbsnp": {
"baseDirectory": "dbSNP",
"sourceFiles": [
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh37",
"files": [
{
"localFileName": "GCF_000001405.25.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.25.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.md5"
}
]
},
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh38",
"files": [
{
"localFileName": "GCF_000001405.40.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.40.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.md5"
}
]
}
]
}
}

The json above is examplke for dbSNP version 156. If you want to use it for different version, adjust the version number and all entries in files to use the actual URL. +If you only want to generate GRCh38, just remove the GRCh37 entries in the sourceFiles.

AutoDownloadGenerate OMIM

dotnet SAUtils.dll AutoDownloadGenerate -s OMIM -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for OMIM should be formatted like below:

{
"omim": {
"baseDirectory": "omim",
"sourceFiles": [
{
"name": "OMIM",
"description": "An Online Catalog of Human Genes and Genetic Disorders"
}
]
}
}

There is no need to modify the json entry for OMIM and you can use as it is. You have to export OMIM API key as environment variable with name OmimApiKey.

AutoDownloadGenerate COSMIC

dotnet SAUtils.dll AutoDownloadGenerate -s COSMIC -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for COSMIC should be formatted like below:

{
"Cosmic": {
"baseDirectory": "COSMIC",
"sourceFiles": [
{
"name": "COSMIC",
"version": "99",
"description": "the Catalogue Of Somatic Mutations In Cancer"
}
]
}
}

You have to adjust the version entry according to which COSMIC version you want. You also need to have COSMIC credentials and export it as environment variable with name Cosmic_Username and Cosmic_Password

+ + + + \ No newline at end of file diff --git a/404.html b/404.html index f7539a89..f40b341e 100644 --- a/404.html +++ b/404.html @@ -6,13 +6,13 @@ Page Not Found | IlluminaConnectedAnnotations - - + +
-

Page Not Found

We could not find what you were looking for.

Please contact the owner of the site that linked you to the original URL and let them know their link is broken.

- - +

Page Not Found

We could not find what you were looking for.

Please contact the owner of the site that linked you to the original URL and let them know their link is broken.

+ + \ No newline at end of file diff --git a/assets/js/017faa10.8264ca5b.js b/assets/js/017faa10.8264ca5b.js new file mode 100644 index 00000000..a5ee027f --- /dev/null +++ b/assets/js/017faa10.8264ca5b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6558],{3905:(t,e,n)=>{n.d(e,{Zo:()=>s,kt:()=>u});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function i(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function o(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var c=r.createContext({}),p=function(t){var e=r.useContext(c),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},s=function(t){var e=p(t.components);return r.createElement(c.Provider,{value:e},t.children)},m="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},f=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,i=t.originalType,c=t.parentName,s=l(t,["components","mdxType","originalType","parentName"]),m=p(n),f=a,u=m["".concat(c,".").concat(f)]||m[f]||d[f]||i;return n?r.createElement(u,o(o({ref:e},s),{},{components:n})):r.createElement(u,o({ref:e},s))}));function u(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var i=n.length,o=new Array(i);o[0]=f;var l={};for(var c in e)hasOwnProperty.call(e,c)&&(l[c]=e[c]);l.originalType=t,l[m]="string"==typeof t?t:a,o[1]=l;for(var p=2;p{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>m,frontMatter:()=>i,metadata:()=>l,toc:()=>c});var r=n(7462),a=(n(7294),n(3905));const i={},o=void 0,l={unversionedId:"data-sources/primate-ai-json",id:"version-3.24/data-sources/primate-ai-json",title:"primate-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/primate-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/primate-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/primate-ai-json.md",tags:[],version:"3.24",frontMatter:{}},c=[],p={toc:c},s="wrapper";function m(t){let{components:e,...n}=t;return(0,a.kt)(s,(0,r.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"primateAI-3D": [\n {\n "aminoAcidPosition": 2,\n "refAminoAcid": "V",\n "altAminoAcid": "M",\n "score": 0.616944,\n "scorePercentile": 0.52,\n "classification": "pathogenic", \n "ensemblTranscriptId": "ENST00000335137.4",\n "refSeqTranscriptId": "NM_001005484.1"\n }\n]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"aminoAcidPosition"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Amino Acid Position (1-based)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"refAminoAcid"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Reference Amino Acid")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"altAminoAcid"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Alternate Amino Acid")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"ensemblTranscriptId"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (Ensembl)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"refSeqTranscriptId"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (RefSeq)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"score"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"classification"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"pathogenic or benign classification")))))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/06f315d1.0371a111.js b/assets/js/06f315d1.0371a111.js new file mode 100644 index 00000000..7d9b7858 --- /dev/null +++ b/assets/js/06f315d1.0371a111.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6599],{3905:(_,E,t)=>{t.d(E,{Zo:()=>R,kt:()=>O});var A=t(7294);function e(_,E,t){return E in _?Object.defineProperty(_,E,{value:t,enumerable:!0,configurable:!0,writable:!0}):_[E]=t,_}function n(_,E){var t=Object.keys(_);if(Object.getOwnPropertySymbols){var A=Object.getOwnPropertySymbols(_);E&&(A=A.filter((function(E){return Object.getOwnPropertyDescriptor(_,E).enumerable}))),t.push.apply(t,A)}return t}function a(_){for(var E=1;E=0||(e[t]=_[t]);return e}(_,E);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(_);for(A=0;A=0||Object.prototype.propertyIsEnumerable.call(_,t)&&(e[t]=_[t])}return e}var M=A.createContext({}),N=function(_){var E=A.useContext(M),t=E;return _&&(t="function"==typeof _?_(E):a(a({},E),_)),t},R=function(_){var E=N(_.components);return A.createElement(M.Provider,{value:E},_.children)},F="mdxType",L={inlineCode:"code",wrapper:function(_){var E=_.children;return A.createElement(A.Fragment,{},E)}},l=A.forwardRef((function(_,E){var t=_.components,e=_.mdxType,n=_.originalType,M=_.parentName,R=r(_,["components","mdxType","originalType","parentName"]),F=N(t),l=e,O=F["".concat(M,".").concat(l)]||F[l]||L[l]||n;return t?A.createElement(O,a(a({ref:E},R),{},{components:t})):A.createElement(O,a({ref:E},R))}));function O(_,E){var t=arguments,e=E&&E.mdxType;if("string"==typeof _||e){var n=t.length,a=new Array(n);a[0]=l;var r={};for(var M in E)hasOwnProperty.call(E,M)&&(r[M]=E[M]);r.originalType=_,r[F]="string"==typeof _?_:e,a[1]=r;for(var N=2;N{t.r(E),t.d(E,{contentTitle:()=>a,default:()=>F,frontMatter:()=>n,metadata:()=>r,toc:()=>M});var A=t(7462),e=(t(7294),t(3905));const n={},a=void 0,r={unversionedId:"data-sources/gnomad-structural-variants-data_description",id:"version-3.24/data-sources/gnomad-structural-variants-data_description",title:"gnomad-structural-variants-data_description",description:"Bed Example",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-data_description.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-structural-variants-data_description",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-data_description",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-data_description.md",tags:[],version:"3.24",frontMatter:{}},M=[{value:"Bed Example",id:"bed-example",children:[],level:4},{value:"Structural Variant Type Mapping",id:"structural-variant-type-mapping",children:[],level:4}],N={toc:M},R="wrapper";function F(_){let{components:E,...t}=_;return(0,e.kt)(R,(0,A.Z)({},N,t,{components:E,mdxType:"MDXLayout"}),(0,e.kt)("h4",{id:"bed-example"},"Bed Example"),(0,e.kt)("p",null,"The bed file was obtained from original source for GRCh37"),(0,e.kt)("pre",null,(0,e.kt)("code",{parentName:"pre",className:"language-scss"},"#chrom start end name svtype ALGORITHMS BOTHSIDES_SUPPORT CHR2 CPX_INTERVALS CPX_TYPE END2 ENDEVIDENCE HIGH_SR_BACKGROUND PCRPLUS_DEPLETED PESR_GT_OVERDISPERSION POS2 PROTEIN_CODING__COPY_GAIN PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC PROTEIN_CODING__INTRONIC PROTEIN_CODING__INV_SPAN PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER PROTEIN_CODING__UTR SOURCE STRANDS SVLEN SVTYPE UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN AC AF N_BI_GENOS N_HOMREF N_HET N_HOMALT FREQ_HOMREF FREQ_HET FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF MALE_N_HET MALE_N_HOMALT MALE_FREQ_HOMREF MALE_FREQ_HET MALE_FREQ_HOMALT MALE_N_HEMIREF MALE_N_HEMIALT MALE_FREQ_HEMIREF MALE_FREQ_HEMIALT PAR FEMALE_AN FEMALE_AC FEMALE_AF FEMALE_N_BI_GENOS FEMALE_N_HOMREF FEMALE_N_HET FEMALE_N_HOMALT FEMALE_FREQ_HOMREF FEMALE_FREQ_HET FEMALE_FREQ_HOMALT POPMAX_AF AFR_AN AFR_AC AFR_AF AFR_N_BI_GENOS AFR_N_HOMREF AFR_N_HET AFR_N_HOMALT AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF AFR_MALE_N_HET AFR_MALE_N_HOMALT AFR_MALE_FREQ_HOMREF AFR_MALE_FREQ_HET AFR_MALE_FREQ_HOMALT AFR_MALE_N_HEMIREF AFR_MALE_N_HEMIALT AFR_MALE_FREQ_HEMIREF AFR_MALE_FREQ_HEMIALT AFR_FEMALE_AN AFR_FEMALE_AC AFR_FEMALE_AF AFR_FEMALE_N_BI_GENOS AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT AMR_AN AMR_AC AMR_AF AMR_N_BI_GENOS AMR_N_HOMREF AMR_N_HET AMR_N_HOMALT AMR_FREQ_HOMREF AMR_FREQ_HET AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF AMR_MALE_N_HET AMR_MALE_N_HOMALT AMR_MALE_FREQ_HOMREF AMR_MALE_FREQ_HET AMR_MALE_FREQ_HOMALT AMR_MALE_N_HEMIREF AMR_MALE_N_HEMIALT AMR_MALE_FREQ_HEMIREF AMR_MALE_FREQ_HEMIALT AMR_FEMALE_AN AMR_FEMALE_AC AMR_FEMALE_AF AMR_FEMALE_N_BI_GENOS AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT EAS_AN EAS_AC EAS_AF EAS_N_BI_GENOS EAS_N_HOMREF EAS_N_HET EAS_N_HOMALT EAS_FREQ_HOMREF EAS_FREQ_HET EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF EAS_MALE_N_HET EAS_MALE_N_HOMALT EAS_MALE_FREQ_HOMREF EAS_MALE_FREQ_HET EAS_MALE_FREQ_HOMALT EAS_MALE_N_HEMIREF EAS_MALE_N_HEMIALT EAS_MALE_FREQ_HEMIREF EAS_MALE_FREQ_HEMIALT EAS_FEMALE_AN EAS_FEMALE_AC EAS_FEMALE_AF EAS_FEMALE_N_BI_GENOS EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT EUR_AN EUR_AC EUR_AF EUR_N_BI_GENOS EUR_N_HOMREF EUR_N_HET EUR_N_HOMALT EUR_FREQ_HOMREF EUR_FREQ_HET EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF EUR_MALE_N_HET EUR_MALE_N_HOMALT EUR_MALE_FREQ_HOMREF EUR_MALE_FREQ_HET EUR_MALE_FREQ_HOMALT EUR_MALE_N_HEMIREF EUR_MALE_N_HEMIALT EUR_MALE_FREQ_HEMIREF EUR_MALE_FREQ_HEMIALT EUR_FEMALE_AN EUR_FEMALE_AC EUR_FEMALE_AF EUR_FEMALE_N_BI_GENOS EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT OTH_AN OTH_AC OTH_AF OTH_N_BI_GENOS OTH_N_HOMREF OTH_N_HET OTH_N_HOMALT OTH_FREQ_HOMREF OTH_FREQ_HET OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF OTH_MALE_N_HET OTH_MALE_N_HOMALT OTH_MALE_FREQ_HOMREF OTH_MALE_FREQ_HET OTH_MALE_FREQ_HOMALT OTH_MALE_N_HEMIREF OTH_MALE_N_HEMIALT OTH_MALE_FREQ_HEMIREF OTH_MALE_FREQ_HEMIALT OTH_FEMALE_AN OTH_FEMALE_AC OTH_FEMALE_AF OTH_FEMALE_N_BI_GENOS OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT FILTER\n1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED \n")),(0,e.kt)("h4",{id:"structural-variant-type-mapping"},"Structural Variant Type Mapping"),(0,e.kt)("p",null,"The source files represented the structural variants with keys using various naming conventions.\nIn the Illumina Connected Annotations JSON output, these keys will be mapped according to the following. "),(0,e.kt)("table",null,(0,e.kt)("thead",{parentName:"table"},(0,e.kt)("tr",{parentName:"thead"},(0,e.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations JSON SV Type Key"),(0,e.kt)("th",{parentName:"tr",align:null},"GRCh37 Source SV Type Key"))),(0,e.kt)("tbody",{parentName:"table"},(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"copy_number_variation"),(0,e.kt)("td",{parentName:"tr",align:null})),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"deletion"),(0,e.kt)("td",{parentName:"tr",align:null},"DEL, CN=0")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"duplication"),(0,e.kt)("td",{parentName:"tr",align:null},"DUP")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"insertion"),(0,e.kt)("td",{parentName:"tr",align:null},"INS")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"inversion"),(0,e.kt)("td",{parentName:"tr",align:null},"INV")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,e.kt)("td",{parentName:"tr",align:null},"INS:ME")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,e.kt)("td",{parentName:"tr",align:null},"INS:ME:ALU")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,e.kt)("td",{parentName:"tr",align:null},"INS:ME:LINE1")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,e.kt)("td",{parentName:"tr",align:null},"INS:ME:SVA")),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"structural alteration"),(0,e.kt)("td",{parentName:"tr",align:null})),(0,e.kt)("tr",{parentName:"tbody"},(0,e.kt)("td",{parentName:"tr",align:null},"complex_structural_alteration"),(0,e.kt)("td",{parentName:"tr",align:null},"CPX")))))}F.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/0fef68c1.a05cae8e.js b/assets/js/0fef68c1.a05cae8e.js new file mode 100644 index 00000000..7b5eeab6 --- /dev/null +++ b/assets/js/0fef68c1.a05cae8e.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7565,9486],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>h});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function r(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),m=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):r(r({},t),e)),n},p=function(e){var t=m(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,o=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),d=m(n),u=i,h=d["".concat(s,".").concat(u)]||d[u]||c[u]||o;return n?a.createElement(h,r(r({ref:t},p),{},{components:n})):a.createElement(h,r({ref:t},p))}));function h(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var o=n.length,r=new Array(o);r[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:i,r[1]=l;for(var m=2;m{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>d,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const o={},r=void 0,l={unversionedId:"data-sources/omim-json",id:"version-3.24/data-sources/omim-json",title:"omim-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/omim-json.md",sourceDirName:"data-sources",slug:"/data-sources/omim-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/omim-json.md",tags:[],version:"3.24",frontMatter:{}},s=[{value:"Phenotype",id:"phenotype",children:[],level:4},{value:"Mapping",id:"mapping",children:[],level:4},{value:"Inheritance",id:"inheritance",children:[],level:4},{value:"Comments",id:"comments",children:[],level:4}],m={toc:s},p="wrapper";function d(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},m,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"omim":[ \n { \n "mimNumber":600678,\n "geneName":"MutS, E. coli, homolog of, 6",\n "description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",\n "phenotypes":[ \n { \n "mimNumber":614350,\n "phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",\n "description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal dominant"\n ]\n },\n { \n "mimNumber":608089,\n "phenotype":"Endometrial cancer, familial",\n "mapping":"molecular basis of the disorder is known"\n },\n { \n "mimNumber":276300,\n "phenotype":"Mismatch repair cancer syndrome",\n "description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal recessive"\n ],\n "comments" : [\n "contribute to susceptibility to multifactorial disorders or to susceptibility to infection",\n "unconfirmed or possibly spurious mapping"\n ]\n }\n ]\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,i.kt)("td",{parentName:"tr",align:"left"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"OMIM ID for gene")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"geneName"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene name")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"description"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,i.kt)("td",{parentName:"tr",align:"left"},"object array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see ",(0,i.kt)("a",{parentName:"td",href:"#phenotype"},"Phenotype entry below"))))),(0,i.kt)("h4",{id:"phenotype"},"Phenotype"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,i.kt)("td",{parentName:"tr",align:"left"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"phenotype"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"description"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"mapping"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see ",(0,i.kt)("a",{parentName:"td",href:"#mapping"},"possible values below"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"inheritance"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see ",(0,i.kt)("a",{parentName:"td",href:"#inheritance"},"possible values below"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"comments"),(0,i.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see ",(0,i.kt)("a",{parentName:"td",href:"#comments"},"possible values below"))))),(0,i.kt)("h4",{id:"mapping"},"Mapping"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},"disorder was positioned by mapping of the wild type gene"),(0,i.kt)("li",{parentName:"ol"},"disease phenotype itself was mapped"),(0,i.kt)("li",{parentName:"ol"},"molecular basis of the disorder is known"),(0,i.kt)("li",{parentName:"ol"},"disorder is a chromosome deletion or duplication syndrome")),(0,i.kt)("h4",{id:"inheritance"},"Inheritance"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"autosomal recessive"),(0,i.kt)("li",{parentName:"ul"},"autosomal dominant")),(0,i.kt)("h4",{id:"comments"},"Comments"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"contributes to the susceptibility to multifactorial disorders"),(0,i.kt)("li",{parentName:"ul"},"variations that lead to apparently abnormal laboratory test values"),(0,i.kt)("li",{parentName:"ul"},"unconfirmed mapping")))}d.isMDXComponent=!0},2517:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>c,frontMatter:()=>r,metadata:()=>s,toc:()=>m});var a=n(7462),i=(n(7294),n(3905)),o=n(2898);const r={title:"OMIM"},l=void 0,s={unversionedId:"data-sources/omim",id:"version-3.24/data-sources/omim",title:"OMIM",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/omim.mdx",sourceDirName:"data-sources",slug:"/data-sources/omim",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/omim.mdx",tags:[],version:"3.24",frontMatter:{title:"OMIM"},sidebar:"docs",previous:{title:"MITOMAP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap"},next:{title:"PhyloP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop"}},m=[{value:"Overview",id:"overview",children:[],level:2},{value:"Parse OMIM data",id:"parse-omim-data",children:[{value:"mim2gene.txt",id:"mim2genetxt",children:[],level:3},{value:"OMIM API",id:"omim-api",children:[{value:"Mapping key to content",id:"mapping-key-to-content",children:[],level:4},{value:"Phenotype character to comment",id:"phenotype-character-to-comment",children:[],level:4}],level:3},{value:"Remove links in OMIM descriptions",id:"remove-links-in-omim-descriptions",children:[],level:3}],level:2},{value:"JSON output",id:"json-output",children:[],level:2},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[{value:"Using subcommands downloadOMIM and omim",id:"using-subcommands-downloadomim-and-omim",children:[],level:3}],level:2}],p={toc:m},d="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(d,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publications")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: ",(0,i.kt)("a",{parentName:"p",href:"https://pubmed.ncbi.nlm.nih.gov/30445645/"},"30445645"),"."),(0,i.kt)("p",{parentName:"div"},"Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM\xae), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: ",(0,i.kt)("a",{parentName:"p",href:"https://pubmed.ncbi.nlm.nih.gov/25428349/"},"25428349"),"."))),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Professional data source")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This is a Professional data source and is not available freely. Please contact ",(0,i.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," if you would like to obtain it."))),(0,i.kt)("h2",{id:"parse-omim-data"},"Parse OMIM data"),(0,i.kt)("p",null,"Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols."),(0,i.kt)("h3",{id:"mim2genetxt"},"mim2gene.txt"),(0,i.kt)("p",null,"This mim2gene.txt (",(0,i.kt)("a",{parentName:"p",href:"http://omim.org/static/omim/data/mim2gene.txt"},"http://omim.org/static/omim/data/mim2gene.txt"),") file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre"},"# MIM Number MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq) Entrez Gene ID (NCBI) Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)\n100050 predominantly phenotypes\n100070 phenotype 100329167\n100100 phenotype\n100200 predominantly phenotypes\n100300 phenotype\n100500 moved/removed\n100600 phenotype\n100640 gene 216 ALDH1A1 ENSG00000165092\n100650 gene/phenotype 217 ALDH2 ENSG00000111275\n100660 gene 218 ALDH3A1 ENSG00000108602\n100670 gene 219 ALDH1B1 ENSG00000137124\n100675 predominantly phenotypes\n100678 gene 39 ACAT2 ENSG00000120437\n")),(0,i.kt)("p",null,'The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.'),(0,i.kt)("h3",{id:"omim-api"},"OMIM API"),(0,i.kt)("p",null,"Illumina Connected Annotations retrieves the OMIM annotations from the ",(0,i.kt)("a",{parentName:"p",href:"https://www.omim.org/api"},"OMIM API"),' JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.'),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "omim": {\n "version": "1.0",\n "entryList": [\n {\n "entry": {\n "prefix": "*",\n "mimNumber": 100640,\n "status": "live",\n "titles": {\n "preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",\n "alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\\nACETALDEHYDE DEHYDROGENASE 1;;\\nALDH, LIVER CYTOSOLIC;;\\nRETINAL DEHYDROGENASE 1; RALDH1"\n },\n "textSectionList": [\n {\n "textSection": {\n "textSectionName": "description",\n "textSectionTitle": "Description",\n "textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\\n\\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."\n }\n }\n ],\n "geneMap": {\n "sequenceID": 7709,\n "chromosome": 9,\n "chromosomeSymbol": "9",\n "chromosomeSort": 225,\n "chromosomeLocationStart": 72900670,\n "chromosomeLocationEnd": 72953052,\n "transcript": "ENST00000297785.7",\n "cytoLocation": "9q21",\n "computedCytoLocation": "9q21.13",\n "mimNumber": 100640,\n "geneSymbols": "ALDH1A1",\n "geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",\n "mappingMethod": "REa, A",\n "confidence": "P",\n "mouseGeneSymbol": "Aldh1a1",\n "mouseMgiID": "MGI:1353450",\n "geneInheritance": null\n },\n "externalLinks": {\n "geneIDs": "216",\n "hgncID": "402",\n "ensemblIDs": "ENSG00000165092,ENST00000297785.8",\n "approvedGeneSymbols": "ALDH1A1",\n "ncbiReferenceSequences": "1519246465",\n "proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",\n "uniGenes": "Hs.76392",\n "swissProtIDs": "P00352",\n "decipherGene": false,\n "umlsIDs": "C1412333",\n "gtr": true,\n "cmgGene": false,\n "keggPathways": true,\n "gwasCatalog": false,\n\n }\n }\n },\n {\n "entry": {\n "prefix": "*",\n "mimNumber": 102560,\n "status": "live",\n "titles": {\n "preferredTitle": "ACTIN, GAMMA-1; ACTG1",\n "alternativeTitles": "ACTIN, GAMMA; ACTG;;\\nCYTOSKELETAL GAMMA-ACTIN;;\\nACTIN, CYTOPLASMIC, 2"\n },\n "textSectionList": [\n {\n "textSection": {\n "textSectionName": "description",\n "textSectionTitle": "Description",\n "textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."\n }\n }\n ],\n "geneMap": {\n "sequenceID": 13666,\n "chromosome": 17,\n "chromosomeSymbol": "17",\n "chromosomeSort": 947,\n "chromosomeLocationStart": 81509970,\n "chromosomeLocationEnd": 81512798,\n "transcript": "ENST00000331925.7",\n "cytoLocation": "17q25.3",\n "computedCytoLocation": "17q25.3",\n "mimNumber": 102560,\n "geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",\n "geneName": "Actin, gamma-1",\n "mappingMethod": "REa, A, Fd",\n "confidence": "C",\n "mouseGeneSymbol": "Actg1",\n "mouseMgiID": "MGI:87906",\n "geneInheritance": null,\n "phenotypeMapList": [\n {\n "phenotypeMap": {\n "mimNumber": 102560,\n "phenotype": "Baraitser-Winter syndrome 2",\n "phenotypeMimNumber": 614583,\n "phenotypicSeriesNumber": "PS243310",\n "phenotypeMappingKey": 3,\n "phenotypeInheritance": "Autosomal dominant"\n }\n },\n {\n "phenotypeMap": {\n "mimNumber": 102560,\n "phenotype": "Deafness, autosomal dominant 20/26",\n "phenotypeMimNumber": 604717,\n "phenotypicSeriesNumber": "PS124900",\n "phenotypeMappingKey": 3,\n "phenotypeInheritance": "Autosomal dominant"\n }\n }\n ]\n }\n }\n }\n ]\n }\n}\n')),(0,i.kt)("p",null,"Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations ",(0,i.kt)("a",{parentName:"p",href:"#json-output"},"JSON Output")),(0,i.kt)("p",null,"Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Illumina Connected Annotations JSON key chain"),(0,i.kt)("th",{parentName:"tr",align:"left"},"OMIM API JSON key chain"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:mimNumber"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:mimNumber")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:geneName"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:geneName")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:description"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:textSectionList:textSection:textSectionContent")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:mimNumber"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:phenotype"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:description"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:textSectionList:textSection:textSectionContent")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:mapping"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (",(0,i.kt)("a",{parentName:"td",href:"#mapping-key-to-content"},"see mapping below"),")")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:inheritances"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:phenotypes:comments"),(0,i.kt)("td",{parentName:"tr",align:"left"},"omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (",(0,i.kt)("a",{parentName:"td",href:"#phenotype-character-to-comment"},"see mapping below"),")")))),(0,i.kt)("h4",{id:"mapping-key-to-content"},"Mapping key to content"),(0,i.kt)("p",null,(0,i.kt)("inlineCode",{parentName:"p"},"1")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"disorder was positioned by mapping of the wild type gene"),(0,i.kt)("br",null),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"2")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"disease phenotype itself was mapped"),(0,i.kt)("br",null),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"3")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"molecular basis of the disorder is known"),(0,i.kt)("br",null),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"4")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"disorder is a chromosome deletion or duplication syndrome"),(0,i.kt)("br",null)),(0,i.kt)("h4",{id:"phenotype-character-to-comment"},"Phenotype character to comment"),(0,i.kt)("p",null,(0,i.kt)("inlineCode",{parentName:"p"},"?")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"unconfirmed or possibly spurious mapping"),(0,i.kt)("br",null),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"["),"/",(0,i.kt)("inlineCode",{parentName:"p"},"]")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"nondiseases"),(0,i.kt)("br",null),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"{"),"/",(0,i.kt)("inlineCode",{parentName:"p"},"}")," to ",(0,i.kt)("inlineCode",{parentName:"p"},"contribute to susceptibility to multifactorial disorders or to susceptibility to infection"),(0,i.kt)("br",null)),(0,i.kt)("h3",{id:"remove-links-in-omim-descriptions"},"Remove links in OMIM descriptions"),(0,i.kt)("p",null,"There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:"),(0,i.kt)("p",null,(0,i.kt)("inlineCode",{parentName:"p"},"The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\\n\\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).")),(0,i.kt)("p",null,"As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},'Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.'),(0,i.kt)("li",{parentName:"ul"},'Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".'),(0,i.kt)("li",{parentName:"ul"},'All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".'),(0,i.kt)("li",{parentName:"ul"},'If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".')),(0,i.kt)("p",null,"Here is a list of examples about how the description section supposed to be processed:"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Original text"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Processed text"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"({516030}, {516040}, and {516050})"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(e.g., D1, {168461}; D2, {123833}; D3, {123834})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(e.g., D1; D2; D3)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(desmocollins; see DSC2, {125645})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(desmocollins; see DSC2)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(e.g., see {102700}, {300755})"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(ADH). See also liver mitochondrial ALDH2")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(see, e.g., CACNA1A; {601011})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(see, e.g., CACNA1A)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(e.g., GSTA1; {138359}), mu (e.g., {138350})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(e.g., GSTA1), mu")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(NFKB; see {164011})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(NFKB)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(see ISGF3G, {147574})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(see ISGF3G)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"(DCK; {EC 2.7.1.74}; {125450})"),(0,i.kt)("td",{parentName:"tr",align:"left"},"(DCK; EC 2.7.1.74)")))),(0,i.kt)("h2",{id:"json-output"},"JSON output"),(0,i.kt)(o.default,{mdxType:"JSON"}),(0,i.kt)("h2",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,i.kt)("p",null,"There are 2 ways of building your own OMIM supplementary files using ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils"),"."),(0,i.kt)("p",null,"The first way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"downloadOMIM")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"omim"),"."),(0,i.kt)("p",null,"The second way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),". To use ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),", read more in ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," section."),(0,i.kt)("h3",{id:"using-subcommands-downloadomim-and-omim"},"Using subcommands ",(0,i.kt)("inlineCode",{parentName:"h3"},"downloadOMIM")," and ",(0,i.kt)("inlineCode",{parentName:"h3"},"omim")),(0,i.kt)("p",null,"The first step in builing the OMIM ",(0,i.kt)("inlineCode",{parentName:"p"},".nga")," files is to use the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommand ",(0,i.kt)("inlineCode",{parentName:"p"},"downloadOMIM")," to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable ",(0,i.kt)("em",{parentName:"p"},"OmimApiKey"),"."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"export OmimApiKey=\nSAUtils.dll downloadOMIM\n---------------------------------------------------------------------------\nSAUtils (c) 2024 Illumina, Inc.\n 3.23.0\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll downloadomim [options]\nDownload the OMIM gene annotation data\n\nOPTIONS:\n --cache, -c \n input cache directory\n --ref, -r input reference filename\n --in, -i input configuration JSON path (optional)\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\n")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},'dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14\n\n---------------------------------------------------------------------------\nSAUtils (c) 2024 Illumina, Inc.\n 3.23.0\n---------------------------------------------------------------------------\n\nGene Symbol Update Statistics\n============================================\n{\n "NumGeneSymbolsUpToDate": 16978,\n "NumGeneSymbolsUpdated": 60,\n "NumGenesWhereBothIdsAreNull": 0,\n "NumGeneSymbolsNotInCache": 105,\n "NumUnresolvedGeneSymbolConflicts": 0\n}\n')),(0,i.kt)("p",null,"Once the download has succeeded, the ",(0,i.kt)("inlineCode",{parentName:"p"},"nga")," files can be produced using the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommand ",(0,i.kt)("inlineCode",{parentName:"p"},"omim"),"."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll omim\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll omim [options]\nCreates a gene annotation database from OMIM data\n\nOPTIONS:\n --m2g, -m MimToGeneSymbol tsv file\n --json, -j OMIM entry json file\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\n\ndotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\n\nTime: 00:00:04.5\n")))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/114fee77.7efecee6.js b/assets/js/114fee77.7efecee6.js new file mode 100644 index 00000000..288dd5bd --- /dev/null +++ b/assets/js/114fee77.7efecee6.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[8764],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>m});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",u={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},h=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),d=c(n),h=r,m=d["".concat(s,".").concat(h)]||d[h]||u[h]||i;return n?a.createElement(m,o(o({ref:t},p),{},{components:n})):a.createElement(m,o({ref:t},p))}));function m(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=h;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:r,o[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>i,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={title:"Canonical Transcripts"},o=void 0,l={unversionedId:"core-functionality/canonical-transcripts",id:"version-3.24/core-functionality/canonical-transcripts",title:"Canonical Transcripts",description:"Overview",source:"@site/versioned_docs/version-3.24/core-functionality/canonical-transcripts.md",sourceDirName:"core-functionality",slug:"/core-functionality/canonical-transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/canonical-transcripts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/canonical-transcripts.md",tags:[],version:"3.24",frontMatter:{title:"Canonical Transcripts"},sidebar:"docs",previous:{title:"Custom Annotations",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/custom-annotations"},next:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/gene-fusions"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Known Algorithms",id:"known-algorithms",children:[{value:"UCSC",id:"ucsc",children:[],level:3},{value:"Ensembl",id:"ensembl",children:[],level:3},{value:"ACMG",id:"acmg",children:[],level:3},{value:"ClinVar",id:"clinvar",children:[],level:3}],level:2},{value:"Unified Approach",id:"unified-approach",children:[],level:2}],c={toc:s},p="wrapper";function d(e){let{components:t,...i}=e;return(0,r.kt)(p,(0,a.Z)({},c,i,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation."),(0,r.kt)("p",null,(0,r.kt)("img",{src:n(5735).Z})),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Golden Helix Blog")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: ",(0,r.kt)("a",{parentName:"p",href:"https://blog.goldenhelix.com/whats-in-a-name-the-intricacies-of-identifying-variants/"},"What\u2019s in a Name: The Intricacies of Identifying Variants"),"."))),(0,r.kt)("p",null,"In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources."),(0,r.kt)("h2",{id:"known-algorithms"},"Known Algorithms"),(0,r.kt)("h3",{id:"ucsc"},"UCSC"),(0,r.kt)("p",null,"UCSC publishes a list of canonical transcripts in its ",(0,r.kt)("inlineCode",{parentName:"p"},"knownCanonical")," table which is available via the ",(0,r.kt)("a",{parentName:"p",href:"https://genome.ucsc.edu/cgi-bin/hgTables"},"TableBrowser"),". Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.")),(0,r.kt)("p",null,"If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule."),(0,r.kt)("h3",{id:"ensembl"},"Ensembl"),(0,r.kt)("p",null,"The ",(0,r.kt)("a",{parentName:"p",href:"http://uswest.ensembl.org/Help/Glossary"},"Ensembl glossary")," states:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:"),(0,r.kt)("ol",{parentName:"blockquote"},(0,r.kt)("li",{parentName:"ol"},"Longest CCDS translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (1), choose the longest Ensembl/Havana merged translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (2), choose the longest translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no translation, choose the longest non-protein-coding transcript."))),(0,r.kt)("h3",{id:"acmg"},"ACMG"),(0,r.kt)("p",null,"From the ACMG Guidelines for the Interpretation of Sequence Variants:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.")),(0,r.kt)("h3",{id:"clinvar"},"ClinVar"),(0,r.kt)("p",null,"From the ClinVar paper:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.")),(0,r.kt)("h2",{id:"unified-approach"},"Unified Approach"),(0,r.kt)("p",null,"Our approach is almost identical to the one Golden Helix discussed in their article:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts."),(0,r.kt)("li",{parentName:"ol"},"Sort the transcripts in the following order:",(0,r.kt)("ol",{parentName:"li"},(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("a",{parentName:"li",href:"https://www.lrg-sequence.org/"},"Locus Reference Genomic (LRG)")," entries occur before non-LRG entries"),(0,r.kt)("li",{parentName:"ol"},"Descending CDS length"),(0,r.kt)("li",{parentName:"ol"},"Descending transcript length"),(0,r.kt)("li",{parentName:"ol"},"Ascending accession number"))),(0,r.kt)("li",{parentName:"ol"},"Grab the first entry")))}d.isMDXComponent=!0},5735:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/hk1-transcripts-a5b85474d3b002553687715dbd004907.png"}}]); \ No newline at end of file diff --git a/assets/js/21c89dc6.34478727.js b/assets/js/21c89dc6.34478727.js new file mode 100644 index 00000000..bd4821b1 --- /dev/null +++ b/assets/js/21c89dc6.34478727.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4388],{3905:(t,n,e)=>{e.d(n,{Zo:()=>m,kt:()=>k});var a=e(7294);function l(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function r(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function o(t){for(var n=1;n=0||(l[e]=t[e]);return l}(t,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(l[e]=t[e])}return l}var p=a.createContext({}),u=function(t){var n=a.useContext(p),e=n;return t&&(e="function"==typeof t?t(n):o(o({},n),t)),e},m=function(t){var n=u(t.components);return a.createElement(p.Provider,{value:n},t.children)},d="mdxType",g={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},N=a.forwardRef((function(t,n){var e=t.components,l=t.mdxType,r=t.originalType,p=t.parentName,m=i(t,["components","mdxType","originalType","parentName"]),d=u(e),N=l,k=d["".concat(p,".").concat(N)]||d[N]||g[N]||r;return e?a.createElement(k,o(o({ref:n},m),{},{components:e})):a.createElement(k,o({ref:n},m))}));function k(t,n){var e=arguments,l=n&&n.mdxType;if("string"==typeof t||l){var r=e.length,o=new Array(r);o[0]=N;var i={};for(var p in n)hasOwnProperty.call(n,p)&&(i[p]=n[p]);i.originalType=t,i[d]="string"==typeof t?t:l,o[1]=i;for(var u=2;u{e.r(n),e.d(n,{contentTitle:()=>o,default:()=>d,frontMatter:()=>r,metadata:()=>i,toc:()=>p});var a=e(7462),l=(e(7294),e(3905));const r={},o=void 0,i={unversionedId:"data-sources/gnomad-small-variants-json",id:"version-3.24/data-sources/gnomad-small-variants-json",title:"gnomad-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},m="wrapper";function d(t){let{components:n,...e}=t;return(0,l.kt)(m,(0,a.Z)({},u,e,{components:n,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad":{ \n "coverage":20,\n "allAf":0.190317,\n "maleAf":0.193,\n "femaleAf": 0.1935,\n "afrAf":0.222876,\n "amrAf":0.121394,\n "easAf":0.239802,\n "finAf":0.136833,\n "nfeAf":0.181282,\n "asjAf":0.258278,\n "othAf":0.186094,\n "allAn":30796,\n "maleAn":15096,\n "femaleAn":15700\n "afrAn":8664,\n "amrAn":832,\n "easAn":1618,\n "finAn":3486,\n "nfeAn":14916,\n "asjAn":302,\n "othAn":978,\n "allAc":5861,\n "maleAc":2930,\n "femaleAc": 2931,\n "afrAc":1931,\n "amrAc":101,\n "easAc":388,\n "finAc":477,\n "nfeAc":2704,\n "asjAc":78,\n "othAc":182,\n "allHc":561,\n "afrHc":208,\n "amrHc":6,\n "easHc":42,\n "finHc":31,\n "nfeHc":242,\n "asjHc":13,\n "othHc":19,\n "maleHc":280,\n "femaleHc":281,\n "controlsAllAf":0.190317,\n "controlsAllAn":30796,\n "controlsAllAc":5861,\n "lowComplexityRegion":true,\n "failedFilter":true\n}\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"coverage"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"average coverage (non-negative integer values)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the controls subset. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for male population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for female population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the controls subset. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for male population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for female population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the controls subset. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the African / African American population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the African / African American population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the African / African American population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for African / African American population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Latino population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Latino population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Latino population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Latino population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for East Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Finnish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Finnish population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Finnish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Finnish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Non-Finnish European population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Non-Finnish European population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Non-Finnish European population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Non-Finnish European population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Other population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Other population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Other population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Other population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Ashkenazi Jewish population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Ashkenazi Jewish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the South Asian population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the South Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the South Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"lowComplexityRegion"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant is located in a low complexity region.")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/23eb1a83.5f3133f6.js b/assets/js/23eb1a83.5f3133f6.js new file mode 100644 index 00000000..6a498016 --- /dev/null +++ b/assets/js/23eb1a83.5f3133f6.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1232,1840],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>D});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),p=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},c=function(e){var t=p(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,c=l(e,["components","mdxType","originalType","parentName"]),d=p(n),u=r,D=d["".concat(s,".").concat(u)]||d[u]||m[u]||i;return n?a.createElement(D,o(o({ref:t},c),{},{components:n})):a.createElement(D,o({ref:t},c))}));function D(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:r,o[1]=l;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>i,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={},o=void 0,l={unversionedId:"data-sources/splice-ai-json",id:"version-3.24/data-sources/splice-ai-json",title:"splice-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/splice-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/splice-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/splice-ai-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"spliceAI":[ \n {\n "hgnc":"BLCAP",\n "acceptorGainDistance":-3,\n "acceptorGainScore":0.3,\n "donorLossDistance":7,\n "donorLossScore":0.9\n },\n { \n "hgnc":"NNAT",\n "acceptorGainDistance":-1,\n "acceptorGainScore":0.2,\n "donorGainDistance":-2,\n "donorGainScore":0.3\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGNC gene symbol")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorGainDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorGainScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorLossDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorLossScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorGainDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorGainScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorLossDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorLossScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")))))}d.isMDXComponent=!0},7584:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>m,frontMatter:()=>o,metadata:()=>s,toc:()=>p});var a=n(7462),r=(n(7294),n(3905)),i=n(3677);const o={title:"Splice AI"},l=void 0,s={unversionedId:"data-sources/splice-ai",id:"version-3.24/data-sources/splice-ai",title:"Splice AI",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/splice-ai.mdx",sourceDirName:"data-sources",slug:"/data-sources/splice-ai",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/splice-ai.mdx",tags:[],version:"3.24",frontMatter:{title:"Splice AI"},sidebar:"docs",previous:{title:"REVEL",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel"},next:{title:"TOPMed",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed"}},p=[{value:"Overview",id:"overview",children:[],level:2},{value:"VCF File",id:"vcf-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"Pre-processing",id:"pre-processing",children:[{value:"Filtering",id:"filtering",children:[],level:3}],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],c={toc:p},d="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(d,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. ",(0,r.kt)("em",{parentName:"p"},"Cell"),", ",(0,r.kt)("strong",{parentName:"p"},"176")," (3) (2019), pp. 535-548 e24"))),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Professional data source")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"This is a Professional data source and is not available freely. Please contact ",(0,r.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," if you would like to obtain it."))),(0,r.kt)("h2",{id:"vcf-file"},"VCF File"),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},'##fileformat=VCFv4.0\n##assembly=GRCh37/hg19\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n#CHROM POS ID REF ALT QUAL FILTER INFO\n10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35\n10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1\n10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21\n10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34\n10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34\n10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32\n')),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"From the VCF file, we're mainly interested in the following columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DS_AG")," - \u0394 score (acceptor gain)"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DS_AL")," - \u0394 score (acceptor loss)"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DS_DG")," - \u0394 score (donor gain)"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DS_DL")," - \u0394 score (donor loss)"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DP_AG")," - \u0394 position (acceptor gain) relative to the variant position"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DP_AL")," - \u0394 position (acceptor loss) relative to the variant position"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DP_DG")," - \u0394 position (donor gain) relative to the variant position"),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DP_DL")," - \u0394 position (donor loss) relative to the variant position")),(0,r.kt)("p",null,"The Splice AI team suggests the following interpretation for the scores:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"center"},"Range"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Confidence"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Pathogenicity"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"center"},"0 \u2264 x < 0.1"),(0,r.kt)("td",{parentName:"tr",align:"left"},"low"),(0,r.kt)("td",{parentName:"tr",align:"left"},"likely benign")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"center"},"0.1 \u2264 x \u2264 0.5"),(0,r.kt)("td",{parentName:"tr",align:"left"},"medium"),(0,r.kt)("td",{parentName:"tr",align:"left"},"likely pathogenic")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"center"},"x > 0.5"),(0,r.kt)("td",{parentName:"tr",align:"left"},"high"),(0,r.kt)("td",{parentName:"tr",align:"left"},"pathogenic")))),(0,r.kt)("h2",{id:"pre-processing"},"Pre-processing"),(0,r.kt)("h3",{id:"filtering"},"Filtering"),(0,r.kt)("p",null,"Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed."),(0,r.kt)("p",null,"As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism."),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://basespace.illumina.com/s/5u6ThOblecrh"},"https://basespace.illumina.com/s/5u6ThOblecrh")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(i.default,{mdxType:"JSON"}))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/25bf377f.1ae12a20.js b/assets/js/25bf377f.1ae12a20.js new file mode 100644 index 00000000..3fda0afc --- /dev/null +++ b/assets/js/25bf377f.1ae12a20.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9967,131],{3905:(e,n,t)=>{t.d(n,{Zo:()=>d,kt:()=>h});var a=t(7294);function o(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function r(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,a)}return t}function i(e){for(var n=1;n=0||(o[t]=e[t]);return o}(e,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(o[t]=e[t])}return o}var l=a.createContext({}),c=function(e){var n=a.useContext(l),t=n;return e&&(t="function"==typeof e?e(n):i(i({},n),e)),t},d=function(e){var n=c(e.components);return a.createElement(l.Provider,{value:n},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var n=e.children;return a.createElement(a.Fragment,{},n)}},p=a.forwardRef((function(e,n){var t=e.components,o=e.mdxType,r=e.originalType,l=e.parentName,d=s(e,["components","mdxType","originalType","parentName"]),u=c(t),p=o,h=u["".concat(l,".").concat(p)]||u[p]||m[p]||r;return t?a.createElement(h,i(i({ref:n},d),{},{components:t})):a.createElement(h,i({ref:n},d))}));function h(e,n){var t=arguments,o=n&&n.mdxType;if("string"==typeof e||o){var r=t.length,i=new Array(r);i[0]=p;var s={};for(var l in n)hasOwnProperty.call(n,l)&&(s[l]=n[l]);s.originalType=e,s[u]="string"==typeof e?e:o,i[1]=s;for(var c=2;c{t.r(n),t.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>s,toc:()=>l});var a=t(7462),o=(t(7294),t(3905));const r={},i=void 0,s={unversionedId:"data-sources/amino-acid-conservation-json",id:"version-3.24/data-sources/amino-acid-conservation-json",title:"amino-acid-conservation-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",sourceDirName:"data-sources",slug:"/data-sources/amino-acid-conservation-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",tags:[],version:"3.24",frontMatter:{}},l=[],c={toc:l},d="wrapper";function u(e){let{components:n,...t}=e;return(0,o.kt)(d,(0,a.Z)({},c,t,{components:n,mdxType:"MDXLayout"}),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-json"},'"aminoAcidConservation": {\n "scores": [0.34]\n} \n')),(0,o.kt)("table",null,(0,o.kt)("thead",{parentName:"table"},(0,o.kt)("tr",{parentName:"thead"},(0,o.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,o.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,o.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,o.kt)("tbody",{parentName:"table"},(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:"left"},"aminoAcidConservation"),(0,o.kt)("td",{parentName:"tr",align:"center"},"object"),(0,o.kt)("td",{parentName:"tr",align:"left"})),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:"left"},"scores"),(0,o.kt)("td",{parentName:"tr",align:"center"},"object array of doubles"),(0,o.kt)("td",{parentName:"tr",align:"left"},"percent conserved with respect to human amino acid residue. Range: 0.01 - 1.00")))))}u.isMDXComponent=!0},1758:(e,n,t)=>{t.r(n),t.d(n,{contentTitle:()=>s,default:()=>m,frontMatter:()=>i,metadata:()=>l,toc:()=>c});var a=t(7462),o=(t(7294),t(3905)),r=t(4567);const i={title:"Amino Acid Conservation"},s=void 0,l={unversionedId:"data-sources/amino-acid-conservation",id:"version-3.24/data-sources/amino-acid-conservation",title:"Amino Acid Conservation",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/amino-acid-conservation.mdx",sourceDirName:"data-sources",slug:"/data-sources/amino-acid-conservation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/amino-acid-conservation.mdx",tags:[],version:"3.24",frontMatter:{title:"Amino Acid Conservation"},sidebar:"docs",previous:{title:"1000 Genomes",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes"},next:{title:"Cancer Hotspots",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cancer-hotspots"}},c=[{value:"Overview",id:"overview",children:[],level:2},{value:"FASTA File",id:"fasta-file",children:[],level:2},{value:"Parsing FASTA",id:"parsing-fasta",children:[],level:2},{value:"Assigning scores to Illumina Connected Annotations transcripts",id:"assigning-scores-to-illumina-connected-annotations-transcripts",children:[{value:"GRCh37",id:"grch37",children:[],level:3},{value:"GRCh38",id:"grch38",children:[],level:3}],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],d={toc:c},u="wrapper";function m(e){let{components:n,...t}=e;return(0,o.kt)(u,(0,a.Z)({},d,t,{components:n,mdxType:"MDXLayout"}),(0,o.kt)("h2",{id:"overview"},"Overview"),(0,o.kt)("p",null,"Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans."),(0,o.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,o.kt)("div",{parentName:"div",className:"admonition-heading"},(0,o.kt)("h5",{parentName:"div"},(0,o.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,o.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,o.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,o.kt)("div",{parentName:"div",className:"admonition-content"},(0,o.kt)("p",{parentName:"div"},"Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. ",(0,o.kt)("strong",{parentName:"p"},"Genome Res. 2005")," Aug;15(8):1034-50. (",(0,o.kt)("a",{parentName:"p",href:"http://www.genome.org/cgi/doi/10.1101/gr.3715005"},"http://www.genome.org/cgi/doi/10.1101/gr.3715005"),")"))),(0,o.kt)("h2",{id:"fasta-file"},"FASTA File"),(0,o.kt)("p",null,"The exon alignments are provided in FASTA files as follows:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},">ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+\nMKK\n>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+\nMKK\n>ENST00000641515.2_gorGor3_1_2 3 0 0\n---\n>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-\nMKK\n>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+\nVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ\n>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+\n")),(0,o.kt)("h2",{id:"parsing-fasta"},"Parsing FASTA"),(0,o.kt)("p",null,"For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},"Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL\nChimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL\nGorilla ----------------------------------------------------------------------------------------------------------------------\nOrangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL\nGibbon ----------------------------------------------------------------------------------------------------------------------\nRhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL\nMacaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL\n")),(0,o.kt)("p",null,"If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript.\nFor position 6, we would say that we have 43% conservation (3/7) since three organisms share the same residue as humans."),(0,o.kt)("h2",{id:"assigning-scores-to-illumina-connected-annotations-transcripts"},"Assigning scores to Illumina Connected Annotations transcripts"),(0,o.kt)("p",null,"The source FASTA file comes with Ensembl/UCSC transcript ids of the transcripts used for alignments. The Illumina Connected Annotations cache has RefSeq and Ensembl transcripts and our first attempt was to map the given Ensembl/UCSC ids to their equivalent RefSeq/Ensembl ids. This attempt was unsuccessful since UCSC Table Browser provided mapping without version numbers. So we proceeded as follows:"),(0,o.kt)("ul",null,(0,o.kt)("li",{parentName:"ul"},"Take proteins which have a unique mapping (and hence one set of conservation scores). For ones that mapped to both ChrX and ChrY, we accepted the one from ChrX."),(0,o.kt)("li",{parentName:"ul"},"A Illumina Connected Annotations transcript having an exact peptide sequence match with a uniquely aligned protein is assigned the corresponding conservation scores.")),(0,o.kt)("p",null,"Unfortunately this left us with a very small number of transcripts having conservation scores."),(0,o.kt)("h3",{id:"grch37"},"GRCh37"),(0,o.kt)("ul",null,(0,o.kt)("li",{parentName:"ul"},"Source FASTA contained 41957 protein alignments."),(0,o.kt)("li",{parentName:"ul"},"38165 proteins had unique scores."),(0,o.kt)("li",{parentName:"ul"},"88 aligned proteins existed in Illumina Connected Annotations cache."),(0,o.kt)("li",{parentName:"ul"},"118 transcripts had conservation scores.")),(0,o.kt)("h3",{id:"grch38"},"GRCh38"),(0,o.kt)("ul",null,(0,o.kt)("li",{parentName:"ul"},"Source FASTA contained 110024 protein alignments."),(0,o.kt)("li",{parentName:"ul"},"88961 proteins had unique scores."),(0,o.kt)("li",{parentName:"ul"},"11688 aligned proteins existed in Illumina Connected Annotations cache."),(0,o.kt)("li",{parentName:"ul"},"12098 transcripts had conservation scores.")),(0,o.kt)("h2",{id:"download-url"},"Download URL"),(0,o.kt)("p",null,"GRCh37: ",(0,o.kt)("a",{parentName:"p",href:"http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz"},"http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz")),(0,o.kt)("p",null,"GRCh38: ",(0,o.kt)("a",{parentName:"p",href:"http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz"},"http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz")),(0,o.kt)("h2",{id:"json-output"},"JSON Output"),(0,o.kt)("p",null,"Conservation scores are reported in the transcript section. One score is reported for each alt allele"),(0,o.kt)(r.default,{mdxType:"JSON"}))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/2b7f32d3.c2773f1f.js b/assets/js/2b7f32d3.c2773f1f.js new file mode 100644 index 00000000..d5fa1638 --- /dev/null +++ b/assets/js/2b7f32d3.c2773f1f.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9486],{3905:(e,t,n)=>{n.d(t,{Zo:()=>m,kt:()=>f});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),p=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},m=function(e){var t=p(e.components);return a.createElement(s.Provider,{value:t},e.children)},c="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,m=l(e,["components","mdxType","originalType","parentName"]),c=p(n),u=r,f=c["".concat(s,".").concat(u)]||c[u]||d[u]||i;return n?a.createElement(f,o(o({ref:t},m),{},{components:n})):a.createElement(f,o({ref:t},m))}));function f(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[c]="string"==typeof e?e:r,o[1]=l;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>c,frontMatter:()=>i,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={},o=void 0,l={unversionedId:"data-sources/omim-json",id:"version-3.24/data-sources/omim-json",title:"omim-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/omim-json.md",sourceDirName:"data-sources",slug:"/data-sources/omim-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/omim-json.md",tags:[],version:"3.24",frontMatter:{}},s=[{value:"Phenotype",id:"phenotype",children:[],level:4},{value:"Mapping",id:"mapping",children:[],level:4},{value:"Inheritance",id:"inheritance",children:[],level:4},{value:"Comments",id:"comments",children:[],level:4}],p={toc:s},m="wrapper";function c(e){let{components:t,...n}=e;return(0,r.kt)(m,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"omim":[ \n { \n "mimNumber":600678,\n "geneName":"MutS, E. coli, homolog of, 6",\n "description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",\n "phenotypes":[ \n { \n "mimNumber":614350,\n "phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",\n "description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal dominant"\n ]\n },\n { \n "mimNumber":608089,\n "phenotype":"Endometrial cancer, familial",\n "mapping":"molecular basis of the disorder is known"\n },\n { \n "mimNumber":276300,\n "phenotype":"Mismatch repair cancer syndrome",\n "description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal recessive"\n ],\n "comments" : [\n "contribute to susceptibility to multifactorial disorders or to susceptibility to infection",\n "unconfirmed or possibly spurious mapping"\n ]\n }\n ]\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,r.kt)("td",{parentName:"tr",align:"left"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"OMIM ID for gene")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"geneName"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"gene name")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"description"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:"left"},"object array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#phenotype"},"Phenotype entry below"))))),(0,r.kt)("h4",{id:"phenotype"},"Phenotype"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,r.kt)("td",{parentName:"tr",align:"left"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotype"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"description"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mapping"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#mapping"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"inheritance"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#inheritance"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"comments"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#comments"},"possible values below"))))),(0,r.kt)("h4",{id:"mapping"},"Mapping"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"disorder was positioned by mapping of the wild type gene"),(0,r.kt)("li",{parentName:"ol"},"disease phenotype itself was mapped"),(0,r.kt)("li",{parentName:"ol"},"molecular basis of the disorder is known"),(0,r.kt)("li",{parentName:"ol"},"disorder is a chromosome deletion or duplication syndrome")),(0,r.kt)("h4",{id:"inheritance"},"Inheritance"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"autosomal recessive"),(0,r.kt)("li",{parentName:"ul"},"autosomal dominant")),(0,r.kt)("h4",{id:"comments"},"Comments"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"contributes to the susceptibility to multifactorial disorders"),(0,r.kt)("li",{parentName:"ul"},"variations that lead to apparently abnormal laboratory test values"),(0,r.kt)("li",{parentName:"ul"},"unconfirmed mapping")))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/31f960e2.fcd268df.js b/assets/js/31f960e2.fcd268df.js new file mode 100644 index 00000000..91934ec2 --- /dev/null +++ b/assets/js/31f960e2.fcd268df.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9726],{3905:(e,n,t)=>{t.d(n,{Zo:()=>d,kt:()=>h});var o=t(7294);function a(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function i(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);n&&(o=o.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,o)}return t}function l(e){for(var n=1;n=0||(a[t]=e[t]);return a}(e,n);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(o=0;o=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(a[t]=e[t])}return a}var c=o.createContext({}),s=function(e){var n=o.useContext(c),t=n;return e&&(t="function"==typeof e?e(n):l(l({},n),e)),t},d=function(e){var n=s(e.components);return o.createElement(c.Provider,{value:n},e.children)},m="mdxType",p={inlineCode:"code",wrapper:function(e){var n=e.children;return o.createElement(o.Fragment,{},n)}},u=o.forwardRef((function(e,n){var t=e.components,a=e.mdxType,i=e.originalType,c=e.parentName,d=r(e,["components","mdxType","originalType","parentName"]),m=s(t),u=a,h=m["".concat(c,".").concat(u)]||m[u]||p[u]||i;return t?o.createElement(h,l(l({ref:n},d),{},{components:t})):o.createElement(h,l({ref:n},d))}));function h(e,n){var t=arguments,a=n&&n.mdxType;if("string"==typeof e||a){var i=t.length,l=new Array(i);l[0]=u;var r={};for(var c in n)hasOwnProperty.call(n,c)&&(r[c]=n[c]);r.originalType=e,r[m]="string"==typeof e?e:a,l[1]=r;for(var s=2;s{t.r(n),t.d(n,{contentTitle:()=>l,default:()=>m,frontMatter:()=>i,metadata:()=>r,toc:()=>c});var o=t(7462),a=(t(7294),t(3905));const i={title:"Licensed Content"},l=void 0,r={unversionedId:"introduction/licensedContent",id:"version-3.24/introduction/licensedContent",title:"Licensed Content",description:"Illumina Conncted Annotations supports following content which is available through a license from Illumina.",source:"@site/versioned_docs/version-3.24/introduction/licensedContent.mdx",sourceDirName:"introduction",slug:"/introduction/licensedContent",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/licensedContent",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/introduction/licensedContent.mdx",tags:[],version:"3.24",frontMatter:{title:"Licensed Content"},sidebar:"docs",previous:{title:"Introduction",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/"},next:{title:"Dependencies",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/dependencies"}},c=[{value:"How to obtain the license?",id:"how-to-obtain-the-license",children:[],level:2},{value:"How to use the credentials file?",id:"how-to-use-the-credentials-file",children:[{value:"Download licensed content",id:"download-licensed-content",children:[],level:3},{value:"Annotate with licensed content",id:"annotate-with-licensed-content",children:[],level:3}],level:2},{value:"Licensing Errors",id:"licensing-errors",children:[],level:2}],s={toc:c},d="wrapper";function m(e){let{components:n,...t}=e;return(0,a.kt)(d,(0,o.Z)({},s,t,{components:n,mdxType:"MDXLayout"}),(0,a.kt)("p",null,"Illumina Conncted Annotations supports following content which is available through a license from Illumina.\nThe license file will allow users to download and annotate with these data sources."),(0,a.kt)("ul",null,(0,a.kt)("li",{parentName:"ul"},"COSMIC"),(0,a.kt)("li",{parentName:"ul"},"OMIM"),(0,a.kt)("li",{parentName:"ul"},"Primate AI-3D"),(0,a.kt)("li",{parentName:"ul"},"Splice AI")),(0,a.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,a.kt)("div",{parentName:"div",className:"admonition-heading"},(0,a.kt)("h5",{parentName:"div"},(0,a.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,a.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,a.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,a.kt)("div",{parentName:"div",className:"admonition-content"},(0,a.kt)("p",{parentName:"div"},"License may be customized to allow access to one of more of the above at the time of license creation."))),(0,a.kt)("div",{className:"admonition admonition-note alert alert--secondary"},(0,a.kt)("div",{parentName:"div",className:"admonition-heading"},(0,a.kt)("h5",{parentName:"div"},(0,a.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,a.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,a.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"}))),"note")),(0,a.kt)("div",{parentName:"div",className:"admonition-content"},(0,a.kt)("p",{parentName:"div"},"The Annotator packaged with DRAGEN comes with a license for all premium contents.\nThat is, if the Annotator is run from within DRAGEN, all premium content will be available.\nHowever, this doesn't automatically grant a license to get premium contents while running the Annotator outside of DRAGEN.\nPlease contact ",(0,a.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," for stand-alone licenses."))),(0,a.kt)("h2",{id:"how-to-obtain-the-license"},"How to obtain the license?"),(0,a.kt)("p",null,"Please contact ",(0,a.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," to obtain a special credentials file for the data sources of interest."),(0,a.kt)("p",null,"Visit ",(0,a.kt)("a",{parentName:"p",href:"https://developer.illumina.com/illumina-connected-annotations"},"Illumina Connected Annotations")," for more details."),(0,a.kt)("h2",{id:"how-to-use-the-credentials-file"},"How to use the credentials file?"),(0,a.kt)("p",null,"After obtaining the credentials file, it may be used in two ways:"),(0,a.kt)("ol",null,(0,a.kt)("li",{parentName:"ol"},"Home folder"),(0,a.kt)("li",{parentName:"ol"},"Commandline argument")),(0,a.kt)("p",null,"The default location of the license file is ",(0,a.kt)("inlineCode",{parentName:"p"},"~/.ilmnAnnotations/credentials.json"),". An example of credentials file as below:"),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'{\n "ApiKey":"myApiKey",\n "ApiSecret": "abcdefghikjlmnopqrstuvwxyz-secretKey"\n}\n')),(0,a.kt)("p",null,"However, this can be overridden by the command line argument while downloading/annotating."),(0,a.kt)("h3",{id:"download-licensed-content"},"Download licensed content"),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-shell"},"dotnet Downloader.dll \\\n-o ~/data \\\n-ga GRCh38 \\\n--credentialsFile ~/credentials.json\n")),(0,a.kt)("h3",{id:"annotate-with-licensed-content"},"Annotate with licensed content"),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-shell"},"dotnet Annotator.dll \\\n--ref ~/data/References/7/Homo_sapiens.GRCh38.Nirvana.dat \\\n--sd ~/data/SupplementaryDatabase \\\n-c ~/data/Cache/32 \\\n-i ~/input_vcf-hg38.vcf.gz \\\n-o ~/output \\\n--credentialsFile ~/credentials.json\n")),(0,a.kt)("h2",{id:"licensing-errors"},"Licensing Errors"),(0,a.kt)("p",null,"If the license has expired, Illumina Connected Annotations will stop annotating and exit with an error code.\nThese errors may be skipped by using the ",(0,a.kt)("inlineCode",{parentName:"p"},"--ignoreLicenseError")," command line argument.\nAfter doing this, only basic data sources will be used for annotations.\nThis can also be achieved by deleting the credentials file from the home folder."))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/363a8231.a2b5a59a.js b/assets/js/363a8231.a2b5a59a.js new file mode 100644 index 00000000..3918e9be --- /dev/null +++ b/assets/js/363a8231.a2b5a59a.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[958],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function c(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var s=r.createContext({}),l=function(e){var t=r.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):c(c({},t),e)),n},u=function(e){var t=l(e.components);return r.createElement(s.Provider,{value:t},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},d=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,s=e.parentName,u=i(e,["components","mdxType","originalType","parentName"]),p=l(n),d=a,f=p["".concat(s,".").concat(d)]||p[d]||m[d]||o;return n?r.createElement(f,c(c({ref:t},u),{},{components:n})):r.createElement(f,c({ref:t},u))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,c=new Array(o);c[0]=d;var i={};for(var s in t)hasOwnProperty.call(t,s)&&(i[s]=t[s]);i.originalType=e,i[p]="string"==typeof e?e:a,c[1]=i;for(var l=2;l{n.r(t),n.d(t,{contentTitle:()=>c,default:()=>p,frontMatter:()=>o,metadata:()=>i,toc:()=>s});var r=n(7462),a=(n(7294),n(3905));const o={},c=void 0,i={unversionedId:"data-sources/cosmic-cancer-gene-census",id:"version-3.24/data-sources/cosmic-cancer-gene-census",title:"cosmic-cancer-gene-census",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-cancer-gene-census",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-cancer-gene-census",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",tags:[],version:"3.24",frontMatter:{}},s=[],l={toc:s},u="wrapper";function p(e){let{components:t,...n}=e;return(0,a.kt)(u,(0,r.Z)({},l,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},' {\n "name": "PRDM16",\n "ensemblGeneId": "ENSG00000142611",\n "ncbiGeneId": "63976",\n "hgncId": 14000,\n "cosmic": {\n "tier": 1,\n "roleInCancer": [\n "oncogene",\n "fusion"\n ]\n }\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"roleInCancer"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Possible roles in caner")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"tier"),(0,a.kt)("td",{parentName:"tr",align:"center"},"number"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Cosmic tiers ","[1, 2]")))))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/36d9d5eb.3e884217.js b/assets/js/36d9d5eb.3e884217.js new file mode 100644 index 00000000..23f60573 --- /dev/null +++ b/assets/js/36d9d5eb.3e884217.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[631,1831],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>v});var a=n(7294);function l(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(l[n]=e[n]);return l}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(l[n]=e[n])}return l}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},c=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,l=e.mdxType,r=e.originalType,s=e.parentName,c=o(e,["components","mdxType","originalType","parentName"]),p=d(n),u=l,v=p["".concat(s,".").concat(u)]||p[u]||m[u]||r;return n?a.createElement(v,i(i({ref:t},c),{},{components:n})):a.createElement(v,i({ref:t},c))}));function v(e,t){var n=arguments,l=t&&t.mdxType;if("string"==typeof e||l){var r=n.length,i=new Array(r);i[0]=u;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[p]="string"==typeof e?e:l,i[1]=o;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>p,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/dbsnp-json",id:"version-3.24/data-sources/dbsnp-json",title:"dbsnp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dbsnp-json.md",sourceDirName:"data-sources",slug:"/data-sources/dbsnp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dbsnp-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},c="wrapper";function p(e){let{components:t,...n}=e;return(0,l.kt)(c,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"dbsnp":[\n "rs1042821"\n]\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,l.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"dbsnp"),(0,l.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,l.kt)("td",{parentName:"tr",align:"left"},"dbSNP rsIDs")))))}p.isMDXComponent=!0},2180:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>m,frontMatter:()=>i,metadata:()=>s,toc:()=>d});var a=n(7462),l=(n(7294),n(3905)),r=n(2379);const i={title:"dbSNP"},o=void 0,s={unversionedId:"data-sources/dbsnp",id:"version-3.24/data-sources/dbsnp",title:"dbSNP",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/dbsnp.mdx",sourceDirName:"data-sources",slug:"/data-sources/dbsnp",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dbsnp.mdx",tags:[],version:"3.24",frontMatter:{title:"dbSNP"},sidebar:"docs",previous:{title:"DANN",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann"},next:{title:"DECIPHER",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"VCF File",id:"vcf-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[{value:"Global allele extraction",id:"global-allele-extraction",children:[],level:4},{value:"Equal Allele Frequency Example (2 alleles)",id:"equal-allele-frequency-example-2-alleles",children:[],level:4},{value:"Equal Allele Frequency Example (3 alleles)",id:"equal-allele-frequency-example-3-alleles",children:[],level:4},{value:"Equal Allele Frequency in Alternate Alleles",id:"equal-allele-frequency-in-alternate-alleles",children:[],level:4},{value:"Equal Allele Frequency Between Reference & Alternate Allele",id:"equal-allele-frequency-between-reference--alternate-allele",children:[],level:4}],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[],level:2}],c={toc:d},p="wrapper";function m(e){let{components:t,...n}=e;return(0,l.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,l.kt)("h2",{id:"overview"},"Overview"),(0,l.kt)("p",null,"dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations."),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP\u2014Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. ",(0,l.kt)("em",{parentName:"p"},"Genome Res."),", ",(0,l.kt)("strong",{parentName:"p"},"9"),", 677\u2013679."))),(0,l.kt)("h2",{id:"vcf-file"},"VCF File"),(0,l.kt)("h3",{id:"example"},"Example"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO\n1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \\ \n SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \\\n VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \\\n TOPMED=0.76728147298674821,0.23271852701325178\n")),(0,l.kt)("h3",{id:"parsing"},"Parsing"),(0,l.kt)("p",null,"From the VCF file, we're mainly interested in the following:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"rsID")," from the ",(0,l.kt)("inlineCode",{parentName:"li"},"ID")," field"),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"CAF")," from the ",(0,l.kt)("inlineCode",{parentName:"li"},"INFO")," field")),(0,l.kt)("h4",{id:"global-allele-extraction"},"Global allele extraction"),(0,l.kt)("p",null,"The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values). "),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"Tie Breaking: Global Major Allele")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele."))),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"Tie Breaking: Global Minor Allele")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily."))),(0,l.kt)("h4",{id:"equal-allele-frequency-example-2-alleles"},"Equal Allele Frequency Example (2 alleles)"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 100 A C CAF=0.5,0.5\n")),(0,l.kt)("p",null,"We will select A to be the global major allele and C to be the global minor allele."),(0,l.kt)("h4",{id:"equal-allele-frequency-example-3-alleles"},"Equal Allele Frequency Example (3 alleles)"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 100 A C,T CAF=0.33,0.33,0.33\n")),(0,l.kt)("p",null,"We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele."),(0,l.kt)("h4",{id:"equal-allele-frequency-in-alternate-alleles"},"Equal Allele Frequency in Alternate Alleles"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 100 A C,T CAF=0.2,0.4,0.4\n")),(0,l.kt)("p",null,"We will select C or T to be arbitrarily assigned to be the global major or global minor allele."),(0,l.kt)("h4",{id:"equal-allele-frequency-between-reference--alternate-allele"},"Equal Allele Frequency Between Reference & Alternate Allele"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 100 A C,T CAF=0.2,0.2,0.6\n")),(0,l.kt)("p",null,"We will select T to be the global major allele and C to be the global minor allele."),(0,l.kt)("h2",{id:"known-issues"},"Known Issues"),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"If there are multiple entries with different CAF values for the same allele, we use the first CAF value."))),(0,l.kt)("h2",{id:"download-url"},"Download URL"),(0,l.kt)("p",null,(0,l.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nih.gov/snp/organisms/"},"https://ftp.ncbi.nih.gov/snp/organisms/")),(0,l.kt)("h2",{id:"json-output"},"JSON Output"),(0,l.kt)(r.default,{mdxType:"JSON"}),(0,l.kt)("h2",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,l.kt)("p",null,"You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details."))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/37f5014d.e73f20ff.js b/assets/js/37f5014d.e73f20ff.js new file mode 100644 index 00000000..7cacb0e2 --- /dev/null +++ b/assets/js/37f5014d.e73f20ff.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6321,7165],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>u});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function r(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):r(r({},t),e)),n},p=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},m="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},k=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,l=e.originalType,s=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),m=d(n),k=i,u=m["".concat(s,".").concat(k)]||m[k]||c[k]||l;return n?a.createElement(u,r(r({ref:t},p),{},{components:n})):a.createElement(u,r({ref:t},p))}));function u(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var l=n.length,r=new Array(l);r[0]=k;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[m]="string"==typeof e?e:i,r[1]=o;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>m,frontMatter:()=>l,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const l={},r=void 0,o={unversionedId:"data-sources/clinvar-preview-json",id:"version-3.24/data-sources/clinvar-preview-json",title:"clinvar-preview-json",description:"small variants:",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-preview-json.md",sourceDirName:"data-sources",slug:"/data-sources/clinvar-preview-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-preview-json.md",tags:[],version:"3.24",frontMatter:{}},s=[{value:"Variant Types",id:"variant-types",children:[],level:4},{value:"Review Statuses",id:"review-statuses",children:[],level:4},{value:"classification",id:"classification",children:[],level:4}],d={toc:s},p="wrapper";function m(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"small variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "clinvar-preview": [\n {\n "altAllele": "A",\n "refAllele": "G",\n "variantType": "SNV",\n "accession": "VCV000437934",\n "version": "1",\n "recordType": "classified",\n "dateLastUpdated": "2023-08-06",\n "rcvs": [\n {\n "accession": "RCV000505090",\n "version": "1",\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "descriptions": [\n {\n "dateLastEvaluated": "2016-08-31",\n "classification": "Pathogenic"\n }\n ]\n }\n },\n "classifiedConditions": [\n {\n "condition": "Cleidocranial dysostosis",\n "db": "MedGen",\n "id": "C0008928"\n }\n ]\n }\n ],\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "classification": "Pathogenic",\n "dateLastEvaluated": "2016-08-31",\n "mostRecentSubmission": "2017-09-09",\n "conditions": [\n {\n "type": "Disease",\n "contributesToAggregateClassification": true,\n "traits": [\n {\n "id": "820",\n "name": {\n "xRefs": [\n {\n "db": "Genetic Alliance",\n "id": "Cleidocranial+Dysplasia/1683"\n },\n {\n "db": "SNOMED CT",\n "id": "65976001"\n }\n ],\n "value": "Cleidocranial dysostosis"\n }\n }\n ]\n }\n ]\n }\n },\n "clinicalAssertions": [\n {\n "accession": "SCV000598565"\n }\n ]\n }\n ]\n}\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"large variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "clinvar-preview": [\n {\n "chromosome": "17",\n "begin": 150732,\n "end": 14764202,\n "variantType": "copy_number_gain",\n "accession": "VCV000154089",\n "version": "2",\n "recordType": "classified",\n "dateLastUpdated": "2023-10-15",\n "rcvs": [\n {\n "accession": "RCV000142236",\n "version": "6",\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "descriptions": [\n {\n "dateLastEvaluated": "2014-03-10",\n "classification": "Pathogenic"\n }\n ]\n }\n },\n "classifiedConditions": [\n {\n "condition": "See cases"\n }\n ]\n }\n ],\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "classification": "Pathogenic",\n "dateLastEvaluated": "2014-03-10",\n "mostRecentSubmission": "2015-07-13",\n "conditions": [\n {\n "type": "PhenotypeInstruction",\n "contributesToAggregateClassification": true,\n "traits": [\n {\n "id": "18728",\n "name": {\n "value": "See cases"\n }\n }\n ]\n }\n ]\n }\n },\n "clinicalAssertions": [\n {\n "accession": "SCV000183512"\n }\n ]\n }\n ]\n}\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Chromosome")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"},"start position of variant")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"end"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"},"end of position of variant")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"accession"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"version"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar version")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"variant type")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"recordType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"record type")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"dateLastUpdated"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"rcvs"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"RCV objects associated to this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"classifications"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"classifications for this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"clinicalAssertions"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"SCV objects associated to this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,i.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the ClinVar alternate allele")))),(0,i.kt)("h4",{id:"variant-types"},"Variant Types"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"copy_number_gain"),(0,i.kt)("li",{parentName:"ul"},"copy_number_loss"),(0,i.kt)("li",{parentName:"ul"},"deletion"),(0,i.kt)("li",{parentName:"ul"},"delins"),(0,i.kt)("li",{parentName:"ul"},"duplication"),(0,i.kt)("li",{parentName:"ul"},"insertion"),(0,i.kt)("li",{parentName:"ul"},"inversion"),(0,i.kt)("li",{parentName:"ul"},"SNV"),(0,i.kt)("li",{parentName:"ul"},"tandem_duplication")),(0,i.kt)("h4",{id:"review-statuses"},"Review Statuses"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"criteria provided, conflicting classifications"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, multiple submitters, no conflicts"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, single submitter"),(0,i.kt)("li",{parentName:"ul"},"no assertion criteria provided"),(0,i.kt)("li",{parentName:"ul"},"no classification provided"),(0,i.kt)("li",{parentName:"ul"},"practice guideline"),(0,i.kt)("li",{parentName:"ul"},"reviewed by expert panel")),(0,i.kt)("h4",{id:"classification"},"classification"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Benign"),(0,i.kt)("li",{parentName:"ul"},"Likely benign"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign"),(0,i.kt)("li",{parentName:"ul"},"not provided"),(0,i.kt)("li",{parentName:"ul"},"conflicting data from submitters"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"association"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"risk factor"),(0,i.kt)("li",{parentName:"ul"},"other"),(0,i.kt)("li",{parentName:"ul"},"drug response"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; Pathogenic/Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; Affects"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"protective"),(0,i.kt)("li",{parentName:"ul"},"Affects"),(0,i.kt)("li",{parentName:"ul"},"Benign; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; association"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; association"),(0,i.kt)("li",{parentName:"ul"},"Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic/Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance/Uncertain risk allele"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; association; protective"),(0,i.kt)("li",{parentName:"ul"},"protective; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; association"),(0,i.kt)("li",{parentName:"ul"},"Benign; association"),(0,i.kt)("li",{parentName:"ul"},"Affects; association; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; protective"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign; drug response"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; protective"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele; protective"),(0,i.kt)("li",{parentName:"ul"},"association not found"),(0,i.kt)("li",{parentName:"ul"},"Affects; association"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; association"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; other"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; association; risk factor Pathogenic;"),(0,i.kt)("li",{parentName:"ul"},"association"),(0,i.kt)("li",{parentName:"ul"},"Benign; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely risk allele; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; drug response"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; risk factor"),(0,i.kt)("li",{parentName:"ul"},"other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; Affects"),(0,i.kt)("li",{parentName:"ul"},"association; drug response; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Affects; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; protective"),(0,i.kt)("li",{parentName:"ul"},"confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; association"),(0,i.kt)("li",{parentName:"ul"},"Benign; Affects"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; Affects"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele; risk factor"),(0,i.kt)("li",{parentName:"ul"},"drug response; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"drug response; other"),(0,i.kt)("li",{parentName:"ul"},"association; drug response"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"association; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Pathogenic, low penetrance; other"),(0,i.kt)("li",{parentName:"ul"},"Benign; confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"confers sensitivity; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic/Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; risk factor")))}m.isMDXComponent=!0},2566:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>c,frontMatter:()=>r,metadata:()=>s,toc:()=>d});var a=n(7462),i=(n(7294),n(3905)),l=n(2132);const r={title:"ClinVar Preview"},o=void 0,s={unversionedId:"data-sources/clinvar-preview",id:"version-3.24/data-sources/clinvar-preview",title:"ClinVar Preview",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-preview.mdx",sourceDirName:"data-sources",slug:"/data-sources/clinvar-preview",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-preview.mdx",tags:[],version:"3.24",frontMatter:{title:"ClinVar Preview"},sidebar:"docs",previous:{title:"ClinVar",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar"},next:{title:"COSMIC",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"Parsing",id:"parsing",children:[{value:"Overall XML to JSON mapping",id:"overall-xml-to-json-mapping",children:[],level:3},{value:"Variation fields",id:"variation-fields",children:[],level:3},{value:"Location fields",id:"location-fields",children:[],level:3},{value:"RCVs",id:"rcvs",children:[{value:"Classifications",id:"classifications",children:[{value:"Germline Classification",id:"germline-classification",children:[],level:5},{value:"Classified Conditions",id:"classified-conditions",children:[],level:5}],level:4}],level:3},{value:"Classifications",id:"classifications-1",children:[{value:"Germline Classification",id:"germline-classification-1",children:[{value:"Conditions",id:"conditions",children:[],level:6}],level:5}],level:3},{value:"Clinical Assertions",id:"clinical-assertions",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URLs",id:"download-urls",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[{value:"Using clinvar subcommands and source data files",id:"using-clinvar-subcommands-and-source-data-files",children:[],level:3}],level:2}],p={toc:d},m="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(m,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, ",(0,i.kt)("em",{parentName:"p"},"Nucleic Acids Research"),", ",(0,i.kt)("strong",{parentName:"p"},"46"),", Issue D1, 4 January 2018, Pages D1062\u2013D1067, ",(0,i.kt)("a",{parentName:"p",href:"https://doi.org/10.1093/nar/gkx1153"},"https://doi.org/10.1093/nar/gkx1153")))),(0,i.kt)("p",null,"ClinVar Preview relates to the new ClinVar XML format introduced in 2024.\nFollowing sections describe the parsing and subsequent json format provided by Illumina Connected Annotations."),(0,i.kt)("h2",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"ClinVar ",(0,i.kt)("a",{parentName:"p",href:"https://github.com/ncbi/clinvar/blob/master/FTPSiteXmlChanges.md"},"recommends")," using the VCV XML file because it contains comprehensive information."),(0,i.kt)("p",null,"Parsing is simplified by using the XSD file generation.\nCommand for generating XSD file"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-shell"},"xsd ClinVar_VCV.xsd /n:VariationArchive /c\n")),(0,i.kt)("h3",{id:"overall-xml-to-json-mapping"},"Overall XML to JSON mapping"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"variantType")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"sequence ontology"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.VariationType"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"accession")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"VCV Id from ClinVar"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.Accession"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"version")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"VCV Id version"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.Version"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"recordType")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"classified")),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.RecordType"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"dateLastUpdated")),(0,i.kt)("td",{parentName:"tr",align:null},"date time"),(0,i.kt)("td",{parentName:"tr",align:null},"date VCV was last updated"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.DateLastUpdated"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"chromosome")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"chromosome (large variants only)"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.Chr"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"begin")),(0,i.kt)("td",{parentName:"tr",align:null},"number"),(0,i.kt)("td",{parentName:"tr",align:null},"start position of the variant (large variants only)"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.positionVCF"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"end")),(0,i.kt)("td",{parentName:"tr",align:null},"number"),(0,i.kt)("td",{parentName:"tr",align:null},"end position of the variant (large variants only)"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.displayStop")," or calculated")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"refAllele")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"reference alleles (small variants only)"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.referenceAlleleVCF"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"altAllele")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"alternate alleles (small variants only)"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.alternateAlleleVCF"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"rcvs")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of RCV objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.RCVList"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"classifications")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of classification objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.Classifications"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"clinicalAssertions")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of clinicalAssertion objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"VariationArchive.ClassifiedRecord.ClinicalAssertionList"))))),(0,i.kt)("h3",{id:"variation-fields"},"Variation fields"),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{4-8}","{4-8}":!0},'\n...\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "variantType": "delins",\n "accession": "VCV001381081",\n "version": "3",\n "recordType": "classified",\n "dateLastUpdated": "2024-01-26",\n ...\n}\n')),(0,i.kt)("h3",{id:"location-fields"},"Location fields"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{9-15}","{9-15}":!0},'\n \n 1p36.13\n \n \n \n ...\n \n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON Small Variant")),(0,i.kt)("p",null,"note the alleles are trimmed"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "altAllele": "GGCAACCGGCGCCTCAAGGAGAG",\n "refAllele": "-",\n ...\n}\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON Large Variant")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "chromosome": "17",\n "begin": 150732,\n "end": 14764202,\n ...\n}\n')),(0,i.kt)("h3",{id:"rcvs"},"RCVs"),(0,i.kt)("p",null,"RCV Object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.RCVList")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"accession")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"VCV Id from ClinVar"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"RCVList.RCVAccession.Accession"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"version")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"VCV Id version"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"RCVList.RCVAccession.Accession"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"classifications")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of classification objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"RCVList.RCVAccession.RCVClassifications"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"classifiedConditions")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of classified conditions"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"RCVList.RCVAccession.ClassifiedConditionList"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n \n ...\n \n \n ...\n \n \n\n...\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "rcvs": [\n {\n "accession": "RCV001921860",\n "version": "3",\n "classifications": {\n ...\n },\n "classifiedConditions": [\n ...\n ]\n }\n ]\n}\n')),(0,i.kt)("h4",{id:"classifications"},"Classifications"),(0,i.kt)("p",null,"Classification object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications"),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"classification")," can be of following types:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"germlineClassification")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"somaticClinicalImpact")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"oncogenicityClassification"))),(0,i.kt)("h5",{id:"germline-classification"},"Germline Classification"),(0,i.kt)("p",null,"Classification object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications.GermlineClassification")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"reviewStatus")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"review status"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.ReviewStatus"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"descriptions")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of classification objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.Description"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"descriptions[].classification")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"classification"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.Description.Value"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"descriptions[].dateLastEvaluated")),(0,i.kt)("td",{parentName:"tr",align:null},"date"),(0,i.kt)("td",{parentName:"tr",align:null},"date last evaluated"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.Description.DateLastEvaluated"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n criteria provided, single submitter\n Pathogenic\n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "criteria provided, single submitter",\n "descriptions": [\n {\n "dateLastEvaluated": "2021-08-04",\n "classification": "Pathogenic"\n }\n ]\n }\n }\n}\n')),(0,i.kt)("h5",{id:"classified-conditions"},"Classified Conditions"),(0,i.kt)("p",null,"Classified conditions object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.RCVList.RCVAccession.ClassifiedConditionList")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"condition")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"VCV Id from ClinVar"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ClassifiedConditionList.ClassifiedCondition.Value"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"db")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"list of classification objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ClassifiedConditionList.ClassifiedCondition.DB"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"id")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"classification"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ClassifiedConditionList.ClassifiedCondition.ID"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n Gastrointestinal stromal tumor\n Paragangliomas 4\n Pheochromocytoma\n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "classifiedConditions": [\n {\n "condition": "Gastrointestinal stromal tumor",\n "db": "MedGen",\n "id": "C0238198"\n },\n {\n "condition": "Paragangliomas 4",\n "db": "MedGen",\n "id": "C1861848"\n },\n {\n "condition": "Pheochromocytoma",\n "db": "MedGen",\n "id": "C0031511"\n }\n ]\n}\n')),(0,i.kt)("h3",{id:"classifications-1"},"Classifications"),(0,i.kt)("p",null,"Classification object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.Classifications"),"\n",(0,i.kt)("inlineCode",{parentName:"p"},"classification")," can be of following types:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"germlineClassification")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"somaticClinicalImpact")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"oncogenicityClassification"))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n ...\n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"classifications": {\n "germlineClassification": {...}\n}\n')),(0,i.kt)("h5",{id:"germline-classification-1"},"Germline Classification"),(0,i.kt)("p",null,"Classification object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.Classifications.GermlineClassification")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"classification")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"classification"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.Description"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"reviewStatus")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"review status"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.ReviewStatus"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"dateLastEvaluated")),(0,i.kt)("td",{parentName:"tr",align:null},"date"),(0,i.kt)("td",{parentName:"tr",align:null},"date last evaluated"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.DateLastEvaluated"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"mostRecentSubmission")),(0,i.kt)("td",{parentName:"tr",align:null},"date"),(0,i.kt)("td",{parentName:"tr",align:null},"date last evaluated"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.MostRecentSubmission"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"pubMedIds")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of PubMedIds"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.Citation.ID.Value"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"conditions")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of conditions"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"GermlineClassification.ConditionList"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n criteria provided, single submitter\n Pathogenic\n \n 19454582\n \n \n 19802898\n \n \n ...\n \n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "classifications": {\n "germlineClassification": {\n "classification": "Pathogenic",\n "reviewStatus": "criteria provided, single submitter",\n "dateLastEvaluated": "2021-08-04",\n "mostRecentSubmission": "2023-02-07",\n "conditions": [...],\n "pubMedIds": [\n "19454582",\n "19802898"\n ]\n }\n }\n}\n')),(0,i.kt)("h6",{id:"conditions"},"Conditions"),(0,i.kt)("p",null,"Conditions object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.Classifications.GermlineClassification.ConditionList")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"type")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"classification"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Type"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"contributesToAggregateClassification")),(0,i.kt)("td",{parentName:"tr",align:null},"True or blank"),(0,i.kt)("td",{parentName:"tr",align:null},"contributes to aggregate classifcation"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.ContributesToAggregateClassification"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"trait objects"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].id")),(0,i.kt)("td",{parentName:"tr",align:null},"date"),(0,i.kt)("td",{parentName:"tr",align:null},"date last evaluated"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].name")),(0,i.kt)("td",{parentName:"tr",align:null},"object"),(0,i.kt)("td",{parentName:"tr",align:null},"trait name object"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].name.value")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"preferred trait name"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait.Name.ElementValue.Type"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].name.xRefs")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of cross references"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait.Name.XRef"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].name.xRefs[].db")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"preferred name cross reference database"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait.Name.XRef.DB"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"traits[].name.xRefs[].id")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"preferred name cross reference identifier"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ConditionList.TraitSet.Trait.Name.XRef.ID"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n \n \n \n \n Pheochromocytoma\n \n \n \n \n \n Chromaffinoma\n \n ...\n \n \n \n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "classifications": {\n "germlineClassification": {\n "classification": "Pathogenic",\n "reviewStatus": "criteria provided, single submitter",\n "dateLastEvaluated": "2021-08-04",\n "mostRecentSubmission": "2023-02-07",\n "conditions": [\n {\n "type": "Disease",\n "contributesToAggregateClassification": true,\n "traits": [\n {\n "id": "3796",\n "name": {\n "xRefs": [\n {\n "db": "Genetic Alliance",\n "id": "Pheochromocytoma/5718"\n },\n {\n "db": "Human Phenotype Ontology",\n "id": "HP:0002666"\n },\n {\n "db": "MONDO",\n "id": "MONDO:0008233"\n }\n ],\n "value": "Pheochromocytoma"\n }\n }\n ]\n }\n ],\n "pubMedIds": [\n "19454582",\n "19802898"\n ]\n }\n }\n}\n')),(0,i.kt)("h3",{id:"clinical-assertions"},"Clinical Assertions"),(0,i.kt)("p",null,"Conditions object from XML path ",(0,i.kt)("inlineCode",{parentName:"p"},"VariationArchive.ClassifiedRecord.ClinicalAssertionList")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"key"),(0,i.kt)("th",{parentName:"tr",align:null},"type"),(0,i.kt)("th",{parentName:"tr",align:null},"description"),(0,i.kt)("th",{parentName:"tr",align:null},"XML sub-path"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"accession")),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"SCV Id from ClinVar"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ClinicalAssertionList.ClinVarAccession.Accession"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"pubMedIds")),(0,i.kt)("td",{parentName:"tr",align:null},"list"),(0,i.kt)("td",{parentName:"tr",align:null},"list of PubMedIds"),(0,i.kt)("td",{parentName:"tr",align:null},(0,i.kt)("inlineCode",{parentName:"td"},"ClinicalAssertionList.ClinicalAssertion.AttributeSet.Citation.ID.Value"))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," XML ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n \n \n \n current\n \n ...\n \n variation to disease\n \n Invitae Variant Classification Sherloc (09022015)\n \n 28492532\n \n \n \n ...\n \n \n ...\n \n \n ...\n \n \n ...\n \n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"}," JSON ")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n"clinicalAssertions": [\n {\n "accession": "SCV002152762",\n "pubMedIds": [\n "28492532"\n ]\n }\n ]\n}\n')),(0,i.kt)("h2",{id:"known-issues"},"Known Issues"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Entries with following missing/incorrect information are skipped"),(0,i.kt)("ol",{parentName:"div"},(0,i.kt)("li",{parentName:"ol"},"Invalid Ref Allele (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000437934"),")"),(0,i.kt)("li",{parentName:"ol"},"Invalid Alt Allele (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000006637"),")"),(0,i.kt)("li",{parentName:"ol"},"Following variant types are not supported:",(0,i.kt)("ol",{parentName:"li"},(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"Variation")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000001101"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"fusion")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000015269"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"unknown")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000017564"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"protein only")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000132152"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"Complex")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000221337"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"Translocation")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000267801"),")"),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("inlineCode",{parentName:"li"},"no_sequence_alteration")," (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000010504"),")"))),(0,i.kt)("li",{parentName:"ol"},"Only records of type ",(0,i.kt)("inlineCode",{parentName:"li"},"classified")," are included ","[VCV with type ",(0,i.kt)("inlineCode",{parentName:"li"},"included")," is skipped (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000431749"),")]"),(0,i.kt)("li",{parentName:"ol"},"Records with missing genomic location are skipped (example ",(0,i.kt)("inlineCode",{parentName:"li"},"VCV000000254"),")")))),(0,i.kt)("h2",{id:"download-urls"},"Download URLs"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarVCVRelease_00-latest.xml.gz"},"https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarVCVRelease_00-latest.xml.gz")),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)(l.default,{mdxType:"JSON"}),(0,i.kt)("h2",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,i.kt)("p",null,"There are 2 ways of building your own OMIM supplementary files using ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils"),"."),(0,i.kt)("p",null,"The first way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"clinvar"),".\nThe ClinVar ",(0,i.kt)("inlineCode",{parentName:"p"},".nsa")," and ",(0,i.kt)("inlineCode",{parentName:"p"},".nsi")," for Illumina Connected Annotations can be built using the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,i.kt)("inlineCode",{parentName:"p"},"clinvar")," subcommand."),(0,i.kt)("p",null,"The second way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),". To use ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),", read more in ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," section."),(0,i.kt)("h3",{id:"using-clinvar-subcommands-and-source-data-files"},"Using ",(0,i.kt)("inlineCode",{parentName:"h3"},"clinvar")," subcommands and source data files"),(0,i.kt)("p",null,"Two input ",(0,i.kt)("inlineCode",{parentName:"p"},".xml")," files and a ",(0,i.kt)("inlineCode",{parentName:"p"},".version")," file are required in order to build the ",(0,i.kt)("inlineCode",{parentName:"p"},".nsa")," and ",(0,i.kt)("inlineCode",{parentName:"p"},".nsi")," file. You should have the following files:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"ClinVarVCVRelease_00-latest.xml.gz\nClinVarVCVRelease_00-latest.xml.gz.version\n")),(0,i.kt)("p",null,"The version file is a json file with the following format."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},'{\n "name": "ClinVar",\n "version": "20240501",\n "description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",\n "releaseDate": "2024-05-01"\n}\n')),(0,i.kt)("p",null,"You have to adjust the version and release date according to the actual date of the ClinVar."),(0,i.kt)("p",null,"Here is a sample execution:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-shell"},"dotnet SAUtils ClinVarPreview \\\n--r ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat\\\n--vcv ClinVarVCVRelease_00-latest.xml.gz\\\n--o output\n---------------------------------------------------------------------------\nSAUtils (c) 2024 Illumina, Inc.\n 3.24.0\n---------------------------------------------------------------------------\n\nParsing XML completed in 14.7 mins.\nSorting and adjusting completed in 4.7 mins.\nWriting 2351609 Small Varaints\nChromosome 1 completed in 00:00:57.1\nChromosome 2 completed in 00:01:30.8\nChromosome 3 completed in 00:00:32.9\nChromosome 4 completed in 00:00:21.2\nChromosome 5 completed in 00:00:31.7\nChromosome 6 completed in 00:00:34.6\nChromosome 7 completed in 00:00:27.9\nChromosome 8 completed in 00:00:17.9\nChromosome 9 completed in 00:00:34.0\nChromosome 10 completed in 00:00:26.6\nChromosome 11 completed in 00:00:35.4\nChromosome 12 completed in 00:00:31.5\nChromosome 13 completed in 00:00:22.7\nChromosome 14 completed in 00:00:22.7\nChromosome 15 completed in 00:00:23.7\nChromosome 16 completed in 00:00:39.6\nChromosome 17 completed in 00:00:46.7\nChromosome 18 completed in 00:00:10.2\nChromosome 19 completed in 00:00:32.9\nChromosome 20 completed in 00:00:10.7\nChromosome 21 completed in 00:00:05.3\nChromosome 22 completed in 00:00:11.0\nChromosome X completed in 00:00:19.6\nChromosome Y completed in 00:00:00.1\nChromosome MT completed in 00:00:00.3\nMaximum bp shifted for any variant:1\nNSA writing completed in 11.5 mins.\nWriting 76122 Large Varaints\nWriting 76122 intervals to database...\nNSI writing completed in 1.1 mins.\n\nTime: 00:32:10.9\nProcess finished with exit code 0.\n\n\n")))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/3992ad3e.0618eec0.js b/assets/js/3992ad3e.0618eec0.js new file mode 100644 index 00000000..95fef27d --- /dev/null +++ b/assets/js/3992ad3e.0618eec0.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6274],{3905:(e,t,n)=>{n.d(t,{Zo:()=>s,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function l(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var p=r.createContext({}),c=function(e){var t=r.useContext(p),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},s=function(e){var t=c(e.components);return r.createElement(p.Provider,{value:t},e.children)},u="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},m=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,p=e.parentName,s=i(e,["components","mdxType","originalType","parentName"]),u=c(n),m=a,f=u["".concat(p,".").concat(m)]||u[m]||d[m]||o;return n?r.createElement(f,l(l({ref:t},s),{},{components:n})):r.createElement(f,l({ref:t},s))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,l=new Array(o);l[0]=m;var i={};for(var p in t)hasOwnProperty.call(t,p)&&(i[p]=t[p]);i.originalType=e,i[u]="string"==typeof e?e:a,l[1]=i;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>u,frontMatter:()=>o,metadata:()=>i,toc:()=>p});var r=n(7462),a=(n(7294),n(3905));const o={},l=void 0,i={unversionedId:"data-sources/gnomad-lof-json",id:"version-3.24/data-sources/gnomad-lof-json",title:"gnomad-lof-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-lof-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-lof-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],c={toc:p},s="wrapper";function u(e){let{components:t,...n}=e;return(0,a.kt)(s,(0,r.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD":{ \n "pLi":1.00e0,\n "pNull":8.94e-40,\n "pRec":1.84e-16,\n "synZ":-8.44e-2,\n "misZ":5.96e-1,\n "loeuf":1.13e0\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"pLi"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"pNull"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"probability of being completely tolerant of loss of function variation (observed = expected)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"pRec"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"synZ"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"corrected synonymous Z score")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"misZ"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"corrected missense Z score")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"loeuf"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"loss of function observed/expected upper bound fraction (LOEUF)")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/42af2c4d.5eabc270.js b/assets/js/42af2c4d.5eabc270.js new file mode 100644 index 00000000..f6b6a673 --- /dev/null +++ b/assets/js/42af2c4d.5eabc270.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7165],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>g});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function l(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},k="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),k=c(n),u=i,g=k["".concat(s,".").concat(u)]||k[u]||m[u]||r;return n?a.createElement(g,l(l({ref:t},p),{},{components:n})):a.createElement(g,l({ref:t},p))}));function g(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,l=new Array(r);l[0]=u;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[k]="string"==typeof e?e:i,l[1]=o;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>k,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},l=void 0,o={unversionedId:"data-sources/clinvar-preview-json",id:"version-3.24/data-sources/clinvar-preview-json",title:"clinvar-preview-json",description:"small variants:",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-preview-json.md",sourceDirName:"data-sources",slug:"/data-sources/clinvar-preview-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-preview-json.md",tags:[],version:"3.24",frontMatter:{}},s=[{value:"Variant Types",id:"variant-types",children:[],level:4},{value:"Review Statuses",id:"review-statuses",children:[],level:4},{value:"classification",id:"classification",children:[],level:4}],c={toc:s},p="wrapper";function k(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"small variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "clinvar-preview": [\n {\n "altAllele": "A",\n "refAllele": "G",\n "variantType": "SNV",\n "accession": "VCV000437934",\n "version": "1",\n "recordType": "classified",\n "dateLastUpdated": "2023-08-06",\n "rcvs": [\n {\n "accession": "RCV000505090",\n "version": "1",\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "descriptions": [\n {\n "dateLastEvaluated": "2016-08-31",\n "classification": "Pathogenic"\n }\n ]\n }\n },\n "classifiedConditions": [\n {\n "condition": "Cleidocranial dysostosis",\n "db": "MedGen",\n "id": "C0008928"\n }\n ]\n }\n ],\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "classification": "Pathogenic",\n "dateLastEvaluated": "2016-08-31",\n "mostRecentSubmission": "2017-09-09",\n "conditions": [\n {\n "type": "Disease",\n "contributesToAggregateClassification": true,\n "traits": [\n {\n "id": "820",\n "name": {\n "xRefs": [\n {\n "db": "Genetic Alliance",\n "id": "Cleidocranial+Dysplasia/1683"\n },\n {\n "db": "SNOMED CT",\n "id": "65976001"\n }\n ],\n "value": "Cleidocranial dysostosis"\n }\n }\n ]\n }\n ]\n }\n },\n "clinicalAssertions": [\n {\n "accession": "SCV000598565"\n }\n ]\n }\n ]\n}\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"large variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "clinvar-preview": [\n {\n "chromosome": "17",\n "begin": 150732,\n "end": 14764202,\n "variantType": "copy_number_gain",\n "accession": "VCV000154089",\n "version": "2",\n "recordType": "classified",\n "dateLastUpdated": "2023-10-15",\n "rcvs": [\n {\n "accession": "RCV000142236",\n "version": "6",\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "descriptions": [\n {\n "dateLastEvaluated": "2014-03-10",\n "classification": "Pathogenic"\n }\n ]\n }\n },\n "classifiedConditions": [\n {\n "condition": "See cases"\n }\n ]\n }\n ],\n "classifications": {\n "germlineClassification": {\n "reviewStatus": "no assertion criteria provided",\n "classification": "Pathogenic",\n "dateLastEvaluated": "2014-03-10",\n "mostRecentSubmission": "2015-07-13",\n "conditions": [\n {\n "type": "PhenotypeInstruction",\n "contributesToAggregateClassification": true,\n "traits": [\n {\n "id": "18728",\n "name": {\n "value": "See cases"\n }\n }\n ]\n }\n ]\n }\n },\n "clinicalAssertions": [\n {\n "accession": "SCV000183512"\n }\n ]\n }\n ]\n}\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Chromosome")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"},"start position of variant")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"end"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"},"end of position of variant")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"accession"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"version"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar version")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"variant type")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"recordType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"record type")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"dateLastUpdated"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"rcvs"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"RCV objects associated to this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"classifications"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"classifications for this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"clinicalAssertions"),(0,i.kt)("td",{parentName:"tr",align:"center"},"array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"SCV objects associated to this VCV")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,i.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the ClinVar alternate allele")))),(0,i.kt)("h4",{id:"variant-types"},"Variant Types"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"copy_number_gain"),(0,i.kt)("li",{parentName:"ul"},"copy_number_loss"),(0,i.kt)("li",{parentName:"ul"},"deletion"),(0,i.kt)("li",{parentName:"ul"},"delins"),(0,i.kt)("li",{parentName:"ul"},"duplication"),(0,i.kt)("li",{parentName:"ul"},"insertion"),(0,i.kt)("li",{parentName:"ul"},"inversion"),(0,i.kt)("li",{parentName:"ul"},"SNV"),(0,i.kt)("li",{parentName:"ul"},"tandem_duplication")),(0,i.kt)("h4",{id:"review-statuses"},"Review Statuses"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"criteria provided, conflicting classifications"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, multiple submitters, no conflicts"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, single submitter"),(0,i.kt)("li",{parentName:"ul"},"no assertion criteria provided"),(0,i.kt)("li",{parentName:"ul"},"no classification provided"),(0,i.kt)("li",{parentName:"ul"},"practice guideline"),(0,i.kt)("li",{parentName:"ul"},"reviewed by expert panel")),(0,i.kt)("h4",{id:"classification"},"classification"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Benign"),(0,i.kt)("li",{parentName:"ul"},"Likely benign"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign"),(0,i.kt)("li",{parentName:"ul"},"not provided"),(0,i.kt)("li",{parentName:"ul"},"conflicting data from submitters"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"association"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"risk factor"),(0,i.kt)("li",{parentName:"ul"},"other"),(0,i.kt)("li",{parentName:"ul"},"drug response"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; Pathogenic/Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; Affects"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"protective"),(0,i.kt)("li",{parentName:"ul"},"Affects"),(0,i.kt)("li",{parentName:"ul"},"Benign; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; association"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; association"),(0,i.kt)("li",{parentName:"ul"},"Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic/Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance/Uncertain risk allele"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; association; protective"),(0,i.kt)("li",{parentName:"ul"},"protective; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; association"),(0,i.kt)("li",{parentName:"ul"},"Benign; association"),(0,i.kt)("li",{parentName:"ul"},"Affects; association; other"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; protective"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign; drug response"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; protective"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele; protective"),(0,i.kt)("li",{parentName:"ul"},"association not found"),(0,i.kt)("li",{parentName:"ul"},"Affects; association"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; association"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; other"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; other"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; association; risk factor Pathogenic;"),(0,i.kt)("li",{parentName:"ul"},"association"),(0,i.kt)("li",{parentName:"ul"},"Benign; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely risk allele; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance; drug response"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; risk factor"),(0,i.kt)("li",{parentName:"ul"},"other; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Conflicting classifications of pathogenicity; Affects"),(0,i.kt)("li",{parentName:"ul"},"association; drug response; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; drug response"),(0,i.kt)("li",{parentName:"ul"},"Affects; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; protective"),(0,i.kt)("li",{parentName:"ul"},"confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; association"),(0,i.kt)("li",{parentName:"ul"},"Benign; Affects"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic; Affects"),(0,i.kt)("li",{parentName:"ul"},"Uncertain risk allele; risk factor"),(0,i.kt)("li",{parentName:"ul"},"drug response; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Likely risk allele"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; drug response"),(0,i.kt)("li",{parentName:"ul"},"Benign/Likely benign; drug response; other"),(0,i.kt)("li",{parentName:"ul"},"drug response; other"),(0,i.kt)("li",{parentName:"ul"},"association; drug response"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic; confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"association; risk factor"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic/Pathogenic, low penetrance; other"),(0,i.kt)("li",{parentName:"ul"},"Benign; confers sensitivity"),(0,i.kt)("li",{parentName:"ul"},"confers sensitivity; other"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic/Pathogenic, low penetrance"),(0,i.kt)("li",{parentName:"ul"},"Likely benign; risk factor")))}k.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/4523c0b8.b6e19eee.js b/assets/js/4523c0b8.b6e19eee.js new file mode 100644 index 00000000..6dece91c --- /dev/null +++ b/assets/js/4523c0b8.b6e19eee.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6510,6558,4388,9486,958,6274,9636,775,7491,7946,1090,9896,2612,7520,4241,131,1831,1541,3831,3555,1840,9050,7283,4335],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>g});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function i(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),d=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},m=function(t){var e=d(t.components);return a.createElement(p.Provider,{value:e},t.children)},s="mdxType",c={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},u=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,m=o(t,["components","mdxType","originalType","parentName"]),s=d(n),u=r,g=s["".concat(p,".").concat(u)]||s[u]||c[u]||l;return n?a.createElement(g,i(i({ref:e},m),{},{components:n})):a.createElement(g,i({ref:e},m))}));function g(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,i=new Array(l);i[0]=u;var o={};for(var p in e)hasOwnProperty.call(e,p)&&(o[p]=e[p]);o.originalType=t,o[s]="string"==typeof t?t:r,i[1]=o;for(var d=2;d{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/1000Genomes-snv-json",id:"version-3.24/data-sources/1000Genomes-snv-json",title:"1000Genomes-snv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-snv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-snv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":{\n "allAf":0.200879,\n "afrAf":0.210287,\n "amrAf":0.139769,\n "easAf":0.275794,\n "eurAf":0.181909,\n "sasAf":0.173824,\n "allAn":5008,\n "afrAn":1322,\n "amrAn":694,\n "easAn":1008,\n "eurAn":1006,\n "sasAn":978,\n "allAc":1006,\n "afrAc":278,\n "amrAc":97,\n "easAc":278,\n "eurAc":183,\n "sasAc":170\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the African super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the African super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the Ad Mixed American super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the Ad Mixed American super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the East Asian super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the East Asian super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the European super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the European super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the South Asian super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the South Asian super population. Non-zero integer.")))))}s.isMDXComponent=!0},8866:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/1000Genomes-sv-json",id:"version-3.24/data-sources/1000Genomes-sv-json",title:"1000Genomes-sv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-sv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-sv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":[\n {\n "chromosome":"1",\n "begin":1595369,\n "end":1612441,\n "variantType": "copy_number_variation",\n "id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",\n "allAn": 5008,\n "allAc": 2702,\n "allAf": 0.539537,\n "afrAf": 0.6052,\n "amrAf": 0.3675,\n "eurAf": 0.5357,\n "easAf": 0.5368,\n "sasAf": 0.5797,\n "reciprocalOverlap": 0.07555\n }\n],\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"id"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"range: 0 - 1.")))))}s.isMDXComponent=!0},4567:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/amino-acid-conservation-json",id:"version-3.24/data-sources/amino-acid-conservation-json",title:"amino-acid-conservation-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",sourceDirName:"data-sources",slug:"/data-sources/amino-acid-conservation-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"aminoAcidConservation": {\n "scores": [0.34]\n} \n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"aminoAcidConservation"),(0,r.kt)("td",{parentName:"tr",align:"center"},"object"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"scores"),(0,r.kt)("td",{parentName:"tr",align:"center"},"object array of doubles"),(0,r.kt)("td",{parentName:"tr",align:"left"},"percent conserved with respect to human amino acid residue. Range: 0.01 - 1.00")))))}s.isMDXComponent=!0},4869:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/clingen-dosage-json",id:"version-3.24/data-sources/clingen-dosage-json",title:"clingen-dosage-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-dosage-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-dosage-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clingenDosageSensitivityMap": [{\n "chromosome": "15",\n "begin": 30900686,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 0.33994\n},\n{\n "chromosome": "15",\n "begin": 31727418,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "dosage sensitivity unlikely",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 1\n}]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clingenDosageSensitivityMap"),(0,r.kt)("td",{parentName:"tr",align:null},"object array"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"haploinsufficiency"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"triplosensitivity"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"(same as haploinsufficiency)\xa0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"haploinsufficiency and triplosensitivity")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"no evidence to suggest that dosage sensitivity is associated with clinical phenotype"),(0,r.kt)("li",{parentName:"ul"},"little evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,r.kt)("li",{parentName:"ul"},"emerging evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,r.kt)("li",{parentName:"ul"},"sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,r.kt)("li",{parentName:"ul"},"gene associated with autosomal recessive phenotype"),(0,r.kt)("li",{parentName:"ul"},"dosage sensitivity unlikely")))}s.isMDXComponent=!0},6361:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/clingen-gene-validity-json",id:"version-3.24/data-sources/clingen-gene-validity-json",title:"clingen-gene-validity-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-gene-validity-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-gene-validity-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clingenGeneValidity":[\n {\n "diseaseId":"MONDO_0007893",\n "disease":"Noonan syndrome with multiple lentigines",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n },\n {\n "diseaseId":"MONDO_0015280",\n "disease":"cardiofaciocutaneous syndrome",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clingenGeneValidity"),(0,r.kt)("td",{parentName:"tr",align:null},"object"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"diseaseId"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Monarch Disease Ontology ID (MONDO)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"disease"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"disease label")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"classification"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see below for possible values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"classificationDate"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"yyyy-MM-dd")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"classification")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"no reported evidence"),(0,r.kt)("li",{parentName:"ul"},"disputed"),(0,r.kt)("li",{parentName:"ul"},"limited"),(0,r.kt)("li",{parentName:"ul"},"moderate"),(0,r.kt)("li",{parentName:"ul"},"definitive"),(0,r.kt)("li",{parentName:"ul"},"strong"),(0,r.kt)("li",{parentName:"ul"},"refuted"),(0,r.kt)("li",{parentName:"ul"},"no known disease relationship")))}s.isMDXComponent=!0},6478:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/clingen-json",id:"version-3.24/data-sources/clingen-json",title:"clingen-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clingen":[\n {\n "chromosome":"17",\n "begin":525,\n "end":14667519,\n "variantType":"copy_number_gain",\n "id":"nsv996083",\n "clinicalInterpretation":"pathogenic",\n "observedGains":1,\n "validated":true,\n "phenotypes":[\n "Intrauterine growth retardation"\n ],\n "phenotypeIds":[\n "HP:0001511",\n "MedGen:C1853481"\n ],\n "reciprocalOverlap":0.00131\n },\n {\n "chromosome":"17",\n "begin":45835,\n "end":7600330,\n "variantType":"copy_number_loss",\n "id":"nsv869419",\n "clinicalInterpretation":"pathogenic",\n "observedLosses":1,\n "validated":true,\n "phenotypes":[\n "Developmental delay AND/OR other significant developmental or morphological phenotypes"\n ],\n "reciprocalOverlap":0.00254\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clingen"),(0,r.kt)("td",{parentName:"tr",align:null},"object array"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Any of the\xa0sequence alterations defined here.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"id"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Identifier from the data source. Alternatively a VID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clinicalInterpretation"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"observedGains"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,r.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"observedLosses"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,r.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"validated"),(0,r.kt)("td",{parentName:"tr",align:null},"boolean"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:null},"string array"),(0,r.kt)("td",{parentName:"tr",align:null},"Description of the phenotype.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"phenotypeIds"),(0,r.kt)("td",{parentName:"tr",align:null},"string array"),(0,r.kt)("td",{parentName:"tr",align:null},"Description of the phenotype IDs.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"clinicalInterpretation")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"benign"),(0,r.kt)("li",{parentName:"ul"},"curated benign"),(0,r.kt)("li",{parentName:"ul"},"curated pathogenic"),(0,r.kt)("li",{parentName:"ul"},"likely benign"),(0,r.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,r.kt)("li",{parentName:"ul"},"path gain"),(0,r.kt)("li",{parentName:"ul"},"path loss"),(0,r.kt)("li",{parentName:"ul"},"pathogenic"),(0,r.kt)("li",{parentName:"ul"},"uncertain")))}s.isMDXComponent=!0},5666:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/clinvar-json",id:"version-3.24/data-sources/clinvar-json",title:"clinvar-json",description:"small variants:",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-json.md",sourceDirName:"data-sources",slug:"/data-sources/clinvar-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"small variants:")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "id":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "significance":[\n "benign"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "lastUpdatedDate":"2020-03-01",\n "isAlleleSpecific":true\n },\n {\n "id":"RCV000030258.4",\n "variationId":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "alleleOrigins":[\n "germline"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "phenotypes":[\n "Lynch syndrome"\n ],\n "medGenIds":[\n "C1333990"\n ],\n "omimIds":[\n "120435"\n ],\n "significance":[\n "benign"\n ],\n "lastUpdatedDate":"2017-05-01",\n "isAlleleSpecific":true\n }\n]\n')),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"large variants:")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "chromosome":"1", \n "begin":629025, \n "end":8537745, \n "variantType":"copy_number_loss", \n "id":"RCV000051993.4", \n "variationId":"VCV000058242.1", \n "reviewStatus":"criteria provided, single submitter", \n "alleleOrigins":[\n "not provided"\n ], \n "phenotypes":[\n "See cases"\n ], \n "significance":[\n "pathogenic"\n ], \n "lastUpdatedDate":"2022-04-21", \n "pubMedIds":[\n "21844811"\n ]\n },\n {\n "id":"VCV000058242.1",\n "reviewStatus":"criteria provided, single submitter",\n "significance":[\n "pathogenic"\n ],\n "lastUpdatedDate":"2022-04-21"\n },\n ......\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"id"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ClinVar ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variationId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ClinVar VCV ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"variant type")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"reviewStatus"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"alleleOrigins"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"medGenIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"MedGen IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"omimIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"OMIM IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"orphanetIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Orphanet IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"significance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"lastUpdatedDate"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the ClinVar alternate allele")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"reviewStatus:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"no assertion provided"),(0,r.kt)("li",{parentName:"ul"},"no assertion criteria provided"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, single submitter"),(0,r.kt)("li",{parentName:"ul"},"practice guideline"),(0,r.kt)("li",{parentName:"ul"},"classified by multiple submitters"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, conflicting interpretations"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, multiple submitters, no conflicts"),(0,r.kt)("li",{parentName:"ul"},"no interpretation for the single variant")),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"alleleOrigins:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"unknown"),(0,r.kt)("li",{parentName:"ul"},"other"),(0,r.kt)("li",{parentName:"ul"},"germline"),(0,r.kt)("li",{parentName:"ul"},"somatic"),(0,r.kt)("li",{parentName:"ul"},"inherited"),(0,r.kt)("li",{parentName:"ul"},"paternal"),(0,r.kt)("li",{parentName:"ul"},"maternal"),(0,r.kt)("li",{parentName:"ul"},"de-novo"),(0,r.kt)("li",{parentName:"ul"},"biparental"),(0,r.kt)("li",{parentName:"ul"},"uniparental"),(0,r.kt)("li",{parentName:"ul"},"not-tested"),(0,r.kt)("li",{parentName:"ul"},"tested-inconclusive")),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"significance:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"uncertain significance"),(0,r.kt)("li",{parentName:"ul"},"not provided"),(0,r.kt)("li",{parentName:"ul"},"benign"),(0,r.kt)("li",{parentName:"ul"},"likely benign"),(0,r.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,r.kt)("li",{parentName:"ul"},"pathogenic"),(0,r.kt)("li",{parentName:"ul"},"drug response"),(0,r.kt)("li",{parentName:"ul"},"histocompatibility"),(0,r.kt)("li",{parentName:"ul"},"association"),(0,r.kt)("li",{parentName:"ul"},"risk factor"),(0,r.kt)("li",{parentName:"ul"},"protective"),(0,r.kt)("li",{parentName:"ul"},"affects"),(0,r.kt)("li",{parentName:"ul"},"conflicting data from submitters"),(0,r.kt)("li",{parentName:"ul"},"other"),(0,r.kt)("li",{parentName:"ul"},"no interpretation for the single variant"),(0,r.kt)("li",{parentName:"ul"},"conflicting interpretations of pathogenicity")))}s.isMDXComponent=!0},13:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/cosmic-cancer-gene-census",id:"version-3.24/data-sources/cosmic-cancer-gene-census",title:"cosmic-cancer-gene-census",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-cancer-gene-census",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-cancer-gene-census",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},' {\n "name": "PRDM16",\n "ensemblGeneId": "ENSG00000142611",\n "ncbiGeneId": "63976",\n "hgncId": 14000,\n "cosmic": {\n "tier": 1,\n "roleInCancer": [\n "oncogene",\n "fusion"\n ]\n }\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"roleInCancer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Possible roles in caner")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"tier"),(0,r.kt)("td",{parentName:"tr",align:"center"},"number"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Cosmic tiers ","[1, 2]")))))}s.isMDXComponent=!0},7476:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/dann-json",id:"version-3.24/data-sources/dann-json",title:"dann-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dann-json.md",sourceDirName:"data-sources",slug:"/data-sources/dann-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dann-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"dannScore": 0.27\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"dannScore"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1.0")))))}s.isMDXComponent=!0},2379:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/dbsnp-json",id:"version-3.24/data-sources/dbsnp-json",title:"dbsnp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dbsnp-json.md",sourceDirName:"data-sources",slug:"/data-sources/dbsnp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dbsnp-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"dbsnp":[\n "rs1042821"\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"dbsnp"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"dbSNP rsIDs")))))}s.isMDXComponent=!0},7927:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/decipher-json",id:"version-3.24/data-sources/decipher-json",title:"decipher-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/decipher-json.md",sourceDirName:"data-sources",slug:"/data-sources/decipher-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/decipher-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"decipher":[\n {\n "chromosome":"1",\n "begin":13516,\n "end":91073,\n "numDeletions":27,\n "deletionFrequency":0.675,\n "numDuplications":27,\n "duplicationFrequency":0.675,\n "sampleSize":40,\n "reciprocalOverlap": 0.27555,\n "annotationOverlap": 0.5901\n }\n],\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"numDeletions"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"# of observed deletions")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"deletionFrequency"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"numDuplications"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"# of observed duplications")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"duplicationFrequency"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sampleSize"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"total # of samples")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap")))))}s.isMDXComponent=!0},1399:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/gerp-json",id:"version-3.24/data-sources/gerp-json",title:"gerp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gerp-json.md",sourceDirName:"data-sources",slug:"/data-sources/gerp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gerp-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gerpScore": 1.27\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gerpScore"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: -\u221e to +\u221e")))))}s.isMDXComponent=!0},8615:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/gme-json",id:"version-3.24/data-sources/gme-json",title:"gme-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gme-json.md",sourceDirName:"data-sources",slug:"/data-sources/gme-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gme-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gmeVariome":{\n "allAc":10,\n "allAn":202,\n "allAf":0.049504,\n "failedFilter":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele number")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}s.isMDXComponent=!0},7510:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/gnomad-lof-json",id:"version-3.24/data-sources/gnomad-lof-json",title:"gnomad-lof-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-lof-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-lof-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD":{ \n "pLi":1.00e0,\n "pNull":8.94e-40,\n "pRec":1.84e-16,\n "synZ":-8.44e-2,\n "misZ":5.96e-1,\n "loeuf":1.13e0\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"pLi"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"pNull"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"probability of being completely tolerant of loss of function variation (observed = expected)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"pRec"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"synZ"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"corrected synonymous Z score")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"misZ"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"corrected missense Z score")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"loeuf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"loss of function observed/expected upper bound fraction (LOEUF)")))))}s.isMDXComponent=!0},7811:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/gnomad-small-variants-json",id:"version-3.24/data-sources/gnomad-small-variants-json",title:"gnomad-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad":{ \n "coverage":20,\n "allAf":0.190317,\n "maleAf":0.193,\n "femaleAf": 0.1935,\n "afrAf":0.222876,\n "amrAf":0.121394,\n "easAf":0.239802,\n "finAf":0.136833,\n "nfeAf":0.181282,\n "asjAf":0.258278,\n "othAf":0.186094,\n "allAn":30796,\n "maleAn":15096,\n "femaleAn":15700\n "afrAn":8664,\n "amrAn":832,\n "easAn":1618,\n "finAn":3486,\n "nfeAn":14916,\n "asjAn":302,\n "othAn":978,\n "allAc":5861,\n "maleAc":2930,\n "femaleAc": 2931,\n "afrAc":1931,\n "amrAc":101,\n "easAc":388,\n "finAc":477,\n "nfeAc":2704,\n "asjAc":78,\n "othAc":182,\n "allHc":561,\n "afrHc":208,\n "amrHc":6,\n "easHc":42,\n "finHc":31,\n "nfeHc":242,\n "asjHc":13,\n "othHc":19,\n "maleHc":280,\n "femaleHc":281,\n "controlsAllAf":0.190317,\n "controlsAllAn":30796,\n "controlsAllAc":5861,\n "lowComplexityRegion":true,\n "failedFilter":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"coverage"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"average coverage (non-negative integer values)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"controlsAllAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the controls subset. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for male population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for female population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"controlsAllAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the controls subset. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for male population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for female population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"controlsAllAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the controls subset. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African / African American population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the African / African American population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the African / African American population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for African / African American population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Latino population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Latino population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Latino population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Latino population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for East Asian population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"finAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Finnish population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"finAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Finnish population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"finAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Finnish population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"finHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Finnish population. Non-negative integer")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"nfeAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Non-Finnish European population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"nfeAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Non-Finnish European population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"nfeAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Non-Finnish European population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"nfeHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Non-Finnish European population. Non-negative integer")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Other population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Other population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Other population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Other population. Non-negative integer")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"asjAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"asjAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Ashkenazi Jewish population Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"asjAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Ashkenazi Jewish population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"asjHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the South Asian population Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the South Asian population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the South Asian population. Non-negative integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"lowComplexityRegion"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant is located in a low complexity region.")))))}s.isMDXComponent=!0},1231:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/gnomad-structural-variants-json",id:"version-3.24/data-sources/gnomad-structural-variants-json",title:"gnomad-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD-preview": [\n {\n "chromosome": "1",\n "begin": 40001,\n "end": 47200,\n "variantId": "gnomAD-SV_v2.1_DUP_1_1",\n "variantType": "duplication",\n "failedFilter": true,\n "allAf": 0.068963,\n "afrAf": 0.135694,\n "amrAf": 0.022876,\n "easAf": 0.01101,\n "eurAf": 0.007846,\n "othAf": 0.017544,\n "femaleAf": 0.065288,\n "maleAf": 0.07255,\n "allAc": 943,\n "afrAc": 866,\n "amrAc": 21,\n "easAc": 17,\n "eurAc": 37,\n "othAc": 2,\n "femaleAc": 442,\n "maleAc": 499,\n "allAn": 13674,\n "afrAn": 6382,\n "amrAn": 918,\n "easAn": 1544,\n "eurAn": 4716,\n "othAn": 114,\n "femaleAn": 6770,\n "maleAn": 6878,\n "allHc": 91,\n "afrHc": 90,\n "amrHc": 1,\n "easHc": 0,\n "eurHc": 0,\n "othHc": 55,\n "femaleHc": 44,\n "maleHc": 47,\n "reciprocalOverlap": 0.01839,\n "annotationOverlap": 0.16667\n }\n]\n\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"chromosome number")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"position interval start")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"position internal end")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"structural variant type")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantId"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"gnomAD ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all other populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the African super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Ad Mixed American super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the African super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Ad Mixed American super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the African / African American population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Latino population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the East Asian population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"boolean"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Note:")," Following fields are not available in ",(0,r.kt)("em",{parentName:"p"},"GRCh38")," because the source file does not contain this information:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAn")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAn")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter")))))}s.isMDXComponent=!0},3830:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/mitomap-small-variants-json",id:"version-3.24/data-sources/mitomap-small-variants-json",title:"mitomap-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "refAllele":"G",\n "altAllele":"A",\n "diseases":[ \n "Bipolar disorder",\n "Melanoma"\n ],\n "hasHomoplasmy":false,\n "hasHeteroplasmy":true,\n "status":"Reported",\n "clinicalSignificance":"confirmed pathogenic",\n "scorePercentile":83.30,\n "numGenBankFullLengthSeqs":2,\n "pubMedIds":["2316527","6299878","6301949"],\n "isAlleleSpecific":true\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"diseases"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"associated diseases")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hasHomoplasmy"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hasHeteroplasmy"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"status"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"record status")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"clinicalSignificance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"predicted pathogenicity")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"MitoTIP score")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numGenBankFullLengthSeqs"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"left"},"# of GenBank full-length sequences")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the MITOMAP alternate allele")))))}s.isMDXComponent=!0},3623:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/mitomap-structural-variants-json",id:"version-3.24/data-sources/mitomap-structural-variants-json",title:"mitomap-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "chromosome":"MT",\n "begin":3166,\n "end":14152,\n "variantType":"deletion",\n "reciprocalOverlap":0.18068,\n "annotationOverlap":0.42405\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"end"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")))))}s.isMDXComponent=!0},2898:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/omim-json",id:"version-3.24/data-sources/omim-json",title:"omim-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/omim-json.md",sourceDirName:"data-sources",slug:"/data-sources/omim-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/omim-json.md",tags:[],version:"3.24",frontMatter:{}},p=[{value:"Phenotype",id:"phenotype",children:[],level:4},{value:"Mapping",id:"mapping",children:[],level:4},{value:"Inheritance",id:"inheritance",children:[],level:4},{value:"Comments",id:"comments",children:[],level:4}],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"omim":[ \n { \n "mimNumber":600678,\n "geneName":"MutS, E. coli, homolog of, 6",\n "description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",\n "phenotypes":[ \n { \n "mimNumber":614350,\n "phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",\n "description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal dominant"\n ]\n },\n { \n "mimNumber":608089,\n "phenotype":"Endometrial cancer, familial",\n "mapping":"molecular basis of the disorder is known"\n },\n { \n "mimNumber":276300,\n "phenotype":"Mismatch repair cancer syndrome",\n "description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",\n "mapping":"molecular basis of the disorder is known",\n "inheritances":[ \n "Autosomal recessive"\n ],\n "comments" : [\n "contribute to susceptibility to multifactorial disorders or to susceptibility to infection",\n "unconfirmed or possibly spurious mapping"\n ]\n }\n ]\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,r.kt)("td",{parentName:"tr",align:"left"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"OMIM ID for gene")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"geneName"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"gene name")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"description"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:"left"},"object array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#phenotype"},"Phenotype entry below"))))),(0,r.kt)("h4",{id:"phenotype"},"Phenotype"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mimNumber"),(0,r.kt)("td",{parentName:"tr",align:"left"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotype"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"description"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"mapping"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#mapping"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"inheritance"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#inheritance"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"comments"),(0,r.kt)("td",{parentName:"tr",align:"left"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#comments"},"possible values below"))))),(0,r.kt)("h4",{id:"mapping"},"Mapping"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"disorder was positioned by mapping of the wild type gene"),(0,r.kt)("li",{parentName:"ol"},"disease phenotype itself was mapped"),(0,r.kt)("li",{parentName:"ol"},"molecular basis of the disorder is known"),(0,r.kt)("li",{parentName:"ol"},"disorder is a chromosome deletion or duplication syndrome")),(0,r.kt)("h4",{id:"inheritance"},"Inheritance"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"autosomal recessive"),(0,r.kt)("li",{parentName:"ul"},"autosomal dominant")),(0,r.kt)("h4",{id:"comments"},"Comments"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"contributes to the susceptibility to multifactorial disorders"),(0,r.kt)("li",{parentName:"ul"},"variations that lead to apparently abnormal laboratory test values"),(0,r.kt)("li",{parentName:"ul"},"unconfirmed mapping")))}s.isMDXComponent=!0},3962:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/primate-ai-json",id:"version-3.24/data-sources/primate-ai-json",title:"primate-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/primate-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/primate-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/primate-ai-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"primateAI-3D": [\n {\n "aminoAcidPosition": 2,\n "refAminoAcid": "V",\n "altAminoAcid": "M",\n "score": 0.616944,\n "scorePercentile": 0.52,\n "classification": "pathogenic", \n "ensemblTranscriptId": "ENST00000335137.4",\n "refSeqTranscriptId": "NM_001005484.1"\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"aminoAcidPosition"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Amino Acid Position (1-based)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAminoAcid"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Reference Amino Acid")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAminoAcid"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Alternate Amino Acid")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ensemblTranscriptId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (Ensembl)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refSeqTranscriptId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (RefSeq)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"score"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"classification"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"pathogenic or benign classification")))))}s.isMDXComponent=!0},4723:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/revel-json",id:"version-3.24/data-sources/revel-json",title:"revel-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/revel-json.md",sourceDirName:"data-sources",slug:"/data-sources/revel-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/revel-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"revel":{ \n "score":0.027\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"score"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1.0")))))}s.isMDXComponent=!0},3677:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/splice-ai-json",id:"version-3.24/data-sources/splice-ai-json",title:"splice-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/splice-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/splice-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/splice-ai-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"spliceAI":[ \n {\n "hgnc":"BLCAP",\n "acceptorGainDistance":-3,\n "acceptorGainScore":0.3,\n "donorLossDistance":7,\n "donorLossScore":0.9\n },\n { \n "hgnc":"NNAT",\n "acceptorGainDistance":-1,\n "acceptorGainScore":0.2,\n "donorGainDistance":-2,\n "donorGainScore":0.3\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGNC gene symbol")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorGainDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorGainScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorLossDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"acceptorLossScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorGainDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorGainScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorLossDistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"donorLossScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")))))}s.isMDXComponent=!0},5023:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,o={unversionedId:"data-sources/topmed-json",id:"version-3.24/data-sources/topmed-json",title:"topmed-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/topmed-json.md",sourceDirName:"data-sources",slug:"/data-sources/topmed-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/topmed-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],d={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"topmed":{ \n "allAc":20,\n "allAn":125568,\n "allAf":0.000159,\n "allHc":0,\n "failedFilter":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele number. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele frequency (computed by Illumina Connected Annotations)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed homozygous count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}s.isMDXComponent=!0},7194:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>C,default:()=>x,frontMatter:()=>w,metadata:()=>S,toc:()=>F});var a=n(7462),r=(n(7294),n(3905)),l=n(4567),i=n(5666),o=n(6478),p=n(4869),d=n(6361),m=n(2379),s=n(3962),c=n(4723),u=n(7476),g=n(1399),k=n(3677),N=n(3830),f=n(3623),y=n(7811),h=n(7510),b=n(6380),v=n(8866),A=n(2898),I=n(5023),j=n(1231),D=n(8615),M=n(7927),T=n(13);const w={title:"Illumina Connected Annotations JSON File Format"},C=void 0,S={unversionedId:"file-formats/illumina-annotator-json-file-format",id:"version-3.24/file-formats/illumina-annotator-json-file-format",title:"Illumina Connected Annotations JSON File Format",description:"Overview",source:"@site/versioned_docs/version-3.24/file-formats/illumina-annotator-json-file-format.mdx",sourceDirName:"file-formats",slug:"/file-formats/illumina-annotator-json-file-format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-json-file-format",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/file-formats/illumina-annotator-json-file-format.mdx",tags:[],version:"3.24",frontMatter:{title:"Illumina Connected Annotations JSON File Format"},sidebar:"docs",previous:{title:"TOPMed",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed"},next:{title:"Illumina Connected Annotations VCF File Format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-vcf-file-format"}},F=[{value:"Overview",id:"overview",children:[{value:"Conventions",id:"conventions",children:[],level:3},{value:"JSON Layout",id:"json-layout",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"Header",id:"header",children:[{value:"Data Source",id:"data-source",children:[],level:4},{value:"Genome Assemblies",id:"genome-assemblies",children:[],level:4}],level:2},{value:"Positions",id:"positions",children:[{value:"ClinGen",id:"clingen",children:[],level:3},{value:"1000 Genomes (SV)",id:"1000-genomes-sv",children:[],level:3},{value:"gnomAD (SV)",id:"gnomad-sv",children:[],level:3},{value:"MITOMAP (SV)",id:"mitomap-sv",children:[],level:3}],level:2},{value:"Samples",id:"samples",children:[],level:2},{value:"Variants",id:"variants",children:[{value:"Transcripts",id:"transcripts",children:[{value:"Amino Acid Conservation",id:"amino-acid-conservation",children:[],level:4},{value:"Gene Fusions",id:"gene-fusions",children:[],level:4},{value:"Fusion",id:"fusion",children:[],level:4},{value:"Cancer Hotspots",id:"cancer-hotspots",children:[],level:4}],level:3},{value:"Regulatory Regions",id:"regulatory-regions",children:[{value:"Regulatory Types",id:"regulatory-types",children:[],level:4},{value:"Regulatory Consequences",id:"regulatory-consequences",children:[],level:4}],level:3},{value:"ClinVar",id:"clinvar",children:[],level:3},{value:"1000 Genomes",id:"1000-genomes",children:[],level:3},{value:"DANN",id:"dann",children:[],level:3},{value:"dbSNP",id:"dbsnp",children:[],level:3},{value:"DECIPHER",id:"decipher",children:[],level:3},{value:"GERP",id:"gerp",children:[],level:3},{value:"GME Variome",id:"gme-variome",children:[],level:3},{value:"gnomAD",id:"gnomad",children:[],level:3},{value:"MITOMAP",id:"mitomap",children:[],level:3},{value:"Primate AI",id:"primate-ai",children:[],level:3},{value:"REVEL",id:"revel",children:[],level:3},{value:"Splice AI",id:"splice-ai",children:[],level:3},{value:"TOPMed",id:"topmed",children:[],level:3}],level:2},{value:"Genes",id:"genes",children:[{value:"OMIM",id:"omim",children:[],level:3},{value:"gnomAD LoF Gene Metrics",id:"gnomad-lof-gene-metrics",children:[],level:3},{value:"ClinGen Disease Validity",id:"clingen-disease-validity",children:[],level:3},{value:"COSMIC Cancer Gene Census",id:"cosmic-cancer-gene-census",children:[],level:3}],level:2}],R={toc:F},O="wrapper";function x(t){let{components:e,...w}=t;return(0,r.kt)(O,(0,a.Z)({},R,w,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("h3",{id:"conventions"},"Conventions"),(0,r.kt)("p",null,"In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display ",(0,r.kt)("inlineCode",{parentName:"li"},'"isStructuralVariant":false')," a few million times when annotating a small variant VCF."),(0,r.kt)("li",{parentName:"ul"},"When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.")),(0,r.kt)("h3",{id:"json-layout"},"JSON Layout"),(0,r.kt)("p",null,(0,r.kt)("img",{src:n(3431).Z})),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"info")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"In general, each position corresponds to a row in the original VCF file."),(0,r.kt)("p",{parentName:"div"},"For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section."))),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"info")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"We've put together a ",(0,r.kt)("a",{parentName:"p",href:"../introduction/parsing-json"},"new section that discusses how to parse our JSON files")," easily using examples in a ",(0,r.kt)("a",{parentName:"p",href:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/blob/master/static/files/parse-json-python.ipynb"},"Python Jupyter notebook")," and a ",(0,r.kt)("a",{parentName:"p",href:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/blob/master/static/files/parse-json-r.ipynb"},"R version")," as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX."))),(0,r.kt)("h2",{id:"header"},"Header"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'{\n "header":{\n "annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",\n "creationTime":"2017-06-14 15:53:13",\n "genomeAssembly":"GRCh37",\n "dataSources":[\n {\n "name":"OMIM",\n "version":"unknown",\n "description":"An Online Catalog of Human Genes and Genetic Disorders",\n "releaseDate":"2017-05-03"\n },\n {\n "name":"VEP",\n "version":"84",\n "description":"BothRefSeqAndEnsembl",\n "releaseDate":"2017-01-16"\n },\n {\n "name":"ClinVar",\n "version":"20170503",\n "description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",\n "releaseDate":"2017-05-03"\n },\n {\n "name":"phyloP",\n "version":"hg19",\n "description":"46 way conservation score between humans and 45 other vertebrates",\n "releaseDate":"2009-11-10"\n }\n ],\n "samples":[\n "NA12878",\n "NA12891",\n "NA12892"\n ]\n },\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"annotator"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"the name of the annotator and the current version")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"creationTime"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd hh:mm:ss")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"genomeAssembly"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#genome-assemblies"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"schemaVersion"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"left"},"incremented whenever the core structure of the JSON file introduces breaking changes")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"dataVersion"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"dataSources"),(0,r.kt)("td",{parentName:"tr",align:"center"},"object array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#data-source"},"Data Source entry below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"samples"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"the order of these sample names will be used throughout the JSON file when enumerating samples")))),(0,r.kt)("h4",{id:"data-source"},"Data Source"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"name"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"version"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"description"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"optional description of the data source")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"releaseDate"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")))),(0,r.kt)("h4",{id:"genome-assemblies"},"Genome Assemblies"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"GRCh37"),(0,r.kt)("li",{parentName:"ul"},"GRCh38"),(0,r.kt)("li",{parentName:"ul"},"hg19"),(0,r.kt)("li",{parentName:"ul"},"SARSCoV2")),(0,r.kt)("h2",{id:"positions"},"Positions"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"positions":[\n {\n "chromosome":"chr2",\n "position":48010488,\n "id": "4"\n "repeatUnit":"GGCCCC",\n "refRepeatCount":3,\n "svEnd":48020488,\n "refAllele":"G",\n "altAlleles":[\n "A",\n "GT"\n ],\n "quality":461,\n "filters":[\n "PASS"\n ],\n "ciPos":[\n -170,\n 170\n ],\n "ciEnd":[\n -175,\n 175\n ],\n "svLength":1000,\n "strandBias":1.23,\n "jointSomaticNormalQuality":29,\n "cytogeneticBand":"2p16.3",\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Variant Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"position"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf (1-based notation). Range: 1 - 250 million")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"id"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},'provided from ID column in the VCF file, this field will be omitted if empty or has "." value')),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"repeatUnit"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"STR"),(0,r.kt)("td",{parentName:"tr",align:"left"},"provided by ExpansionHunter")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refRepeatCount"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"STR"),(0,r.kt)("td",{parentName:"tr",align:"left"},"provided by ExpansionHunter")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"svEnd"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SV"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"quality"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"filters"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exactly as displayed in the vcf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ciPos"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SV"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ciEnd"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SV"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"svLength"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SV"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"strandBias"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"small variant"),(0,r.kt)("td",{parentName:"tr",align:"left"},"provided by GATK (from SB)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"jointSomaticNormalQuality"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SV"),(0,r.kt)("td",{parentName:"tr",align:"left"},"provided by the Manta variant caller (SOMATICSCORE)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"cytogeneticBand"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"all"),(0,r.kt)("td",{parentName:"tr",align:"left"},"e.g. 17p13.1")))),(0,r.kt)("h3",{id:"clingen"},"ClinGen"),(0,r.kt)(o.default,{mdxType:"ClinGen"}),(0,r.kt)(p.default,{mdxType:"ClinGenDosage"}),(0,r.kt)("h3",{id:"1000-genomes-sv"},"1000 Genomes (SV)"),(0,r.kt)(v.default,{mdxType:"ThousandGenomesSV"}),(0,r.kt)("h3",{id:"gnomad-sv"},"gnomAD (SV)"),(0,r.kt)(j.default,{mdxType:"GnomadSV"}),(0,r.kt)("h3",{id:"mitomap-sv"},"MITOMAP (SV)"),(0,r.kt)(f.default,{mdxType:"MitoMapSV"}),(0,r.kt)("h2",{id:"samples"},"Samples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"samples":[\n {\n "genotype":"0/1",\n "variantFrequencies":[\n 0.333,\n 0.5\n ],\n "totalDepth":57,\n "genotypeQuality":12,\n "copyNumber":3,\n "repeatUnitCounts":[\n 10,\n 20\n ],\n "alleleDepths":[\n 10,\n 20,\n 30\n ],\n "failedFilter":true,\n "splitReadCounts":[\n 10,\n 20\n ],\n "pairedEndReadCounts":[\n 10,\n 20\n ],\n "isDeNovo":true,\n "diseaseAffectedStatuses":[\n "-"\n ],\n "artifactAdjustedQualityScore":89.3,\n "likelihoodRatioQualityScore":78.2,\n "heteroplasmyPercentile":[\n 23.13,\n 12.65\n ]\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"center"},"VCF"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"genotype"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"center"},"GT"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variantFrequencies"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"VF, AD"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. One value per alternate allele")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"totalDepth"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"DP"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"genotypeQuality"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"GQ"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values. Typically maxes out at 99")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"copyNumber"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"CN"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"minorHaplotypeCopyNumber"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"MCN"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"repeatUnitCounts"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"REPCN"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ExpansionHunter-specific")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"alleleDepths"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"AD"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"center"},"FT"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"splitReadCounts"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SR"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Manta-specific")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pairedEndReadCounts"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"PR"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Manta-specific")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isDeNovo"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"center"},"DN"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"deNovoQuality"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"DQ"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"diseaseAffectedStatuses"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"center"},"DST"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ExpansionHunter-specific")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"artifactAdjustedQualityScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"AQ"),(0,r.kt)("td",{parentName:"tr",align:"left"},"PEPE-specific. Range: 0 - 100.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"likelihoodRatioQualityScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"LQ"),(0,r.kt)("td",{parentName:"tr",align:"left"},"PEPE-specific. Range: 0 - 100.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"lossOfHeterozygosity"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"center"},"CN, MCN"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"somaticQuality"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"SQ"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"heteroplasmyPercentile"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"center"},"VF"),(0,r.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 100. 2 decimal places. One value per alternate allele")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"binCount"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"center"},"BC"),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-negative integer values")))),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Empty Samples")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"If a sample does not contain any entries, we will create a sample object that contains the ",(0,r.kt)("inlineCode",{parentName:"p"},"isEmpty")," key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty."),(0,r.kt)("pre",{parentName:"div"},(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"samples":[\n {\n "isEmpty":true\n }\n],\n')))),(0,r.kt)("h2",{id:"variants"},"Variants"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"variants":[\n {\n "vid":"2-48010488-G-A",\n "chromosome":"chr2",\n "begin":48010488,\n "end":48010488,\n "isReferenceMinorAllele":true,\n "isStructuralVariant":true,\n "refAllele":"G",\n "altAllele":"A",\n "variantType":"SNV",\n "hgvsg":"NC_000002.11:g.48010488G>A",\n "phylopScore":0.459\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"vid"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"../core-functionality/variant-ids"},"Variant Identifiers"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"1-based non-negative integer values. Range: 1 - 250 million")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"end"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"1-based non-negative integer values. Range: 1 - 250 million")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isReferenceMinorAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this is a reference minor allele")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isStructuralVariant"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the variant is a structural variant")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"inLowComplexityRegion"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the variant lies in a low complexity region (gnomAD low complexity regions)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"parsimonious representation of the reference allele")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"parsimonious representation of the alternate allele.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"uses\xa0",(0,r.kt)("a",{parentName:"td",href:"http://www.sequenceontology.org/browser/current_svn/term/SO:0001059"},"Sequence Ontology sequence alterations"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgvsg"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGVS g. notation")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phylopScore"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"phyloP conservation score. Range: -14.08 to 6.424")))),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Reference Minor Alleles")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Illumina Connected Annotations supports annotating reference minor alleles. In such a case, ",(0,r.kt)("inlineCode",{parentName:"p"},"refAllele")," will be replaced by the global major allele and ",(0,r.kt)("inlineCode",{parentName:"p"},"altAllele")," will be replaced with the original reference allele."))),(0,r.kt)("h3",{id:"transcripts"},"Transcripts"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"transcripts":[\n {\n "transcript":"ENST00000445503.1",\n "source":"Ensembl",\n "bioType":"NMD_transcript_variant",\n "codons":"gGg/gAg",\n "aminoAcids":"G/E",\n "cdnaPos":"268/4158",\n "cdsPos":"116/483",\n "exons":"1/9",\n "introns":"1/8",\n "proteinPos":"39/160",\n "geneId":"ENSG00000116062",\n "hgnc":"MSH6",\n "consequence":[\n "missense_variant",\n "NMD_transcript_variant"\n ],\n "impact": "moderate",\n "hgvsc":"ENST00000445503.1:c.116G>A",\n "hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",\n "geneFusion":{\n "exon":6,\n "intron":5,\n "fusions":[\n {\n "hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",\n "exon":3,\n "intron":2\n },\n {\n "hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",\n "exon":2,\n "intron":1\n }\n ]\n },\n "isCanonical":true,\n "proteinId":"ENSP00000405294.1",\n "completeOverlap":true\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"transcript"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"transcript ID. e.g. ENST00000445503.1")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"source"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"RefSeq / Ensembl")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"bioType"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"descriptions of the ",(0,r.kt)("a",{parentName:"td",href:"https://uswest.ensembl.org/info/genome/genebuild/biotypes.html"},"biotypes from Ensembl"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"codons"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"aminoAcids"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"cdnaPos"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Format: start-end/Length")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"cdsPos"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Format: start-end/Length")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"exons"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"exons affected by the variant")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"introns"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"introns affected by the variant")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"proteinPos"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Format: start-end/Length")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"geneId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"gene ID. e.g. ENSG00000116062")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"consequence"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"http://www.sequenceontology.org/browser/obob.cgi"},"Sequence Ontology Consequences"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"impact"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"See ",(0,r.kt)("a",{parentName:"td",href:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impacts"},"Consequence Impact")," for details")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgvsc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGVS coding nomenclature")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgvsp"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGVS protein nomenclature")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"geneFusion"),(0,r.kt)("td",{parentName:"tr",align:"center"},"object"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#gene-fusions"},"Gene Fusions entry below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isCanonical"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this is a canonical transcript")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isManeSelect"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this is a MANE select transcript")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"proteinId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"protein ID. E.g. ENSP00000405294.1")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"completeOverlap"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this transcript is completely overlapped by the variant")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"cancerHotspots"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#cancer-hotspots"},"Cancer Hotspots entry below"))))),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"MANE Select")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"MANE select tags are only available for RefSeq transcripts on GRCh38."))),(0,r.kt)("h4",{id:"amino-acid-conservation"},"Amino Acid Conservation"),(0,r.kt)(l.default,{mdxType:"AminoAcidConservation"}),(0,r.kt)("h4",{id:"gene-fusions"},"Gene Fusions"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"exon"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"actual exon where the breakpoint was located")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"intron"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"actual intron where the breakpoint was located")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"fusions"),(0,r.kt)("td",{parentName:"tr",align:"center"},"object array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see ",(0,r.kt)("a",{parentName:"td",href:"#fusion"},"Fusion entry below"))))),(0,r.kt)("h4",{id:"fusion"},"Fusion"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"exon"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"actual exon where the other breakpoint was located")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"intron"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"actual intron where the other breakpoint was located")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgvsc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGVS coding nomenclature describing the two genes and the transcripts that are fused along with")))),(0,r.kt)("h4",{id:"cancer-hotspots"},"Cancer Hotspots"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"residue"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"how many samples are associated with a variant at the same amino acid position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numAltAminoAcidSamples"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"how many samples are associated with a variant with the same position and alternate amino acid position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"qValue"),(0,r.kt)("td",{parentName:"tr",align:"center"},"double"),(0,r.kt)("td",{parentName:"tr",align:"left"})))),(0,r.kt)("h3",{id:"regulatory-regions"},"Regulatory Regions"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"regulatoryRegions":[\n {\n "id":"ENSR00001542175",\n "type":"promoter",\n "consequence":[\n "regulatory_region_variant"\n ]\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"id"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"type"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see ",(0,r.kt)("a",{parentName:"td",href:"#regulatory-types"},"possible values below"))),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"consequence"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:null},"see ",(0,r.kt)("a",{parentName:"td",href:"#regulatory-consequences"},"possible values below"))))),(0,r.kt)("h4",{id:"regulatory-types"},"Regulatory Types"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"CTCF_binding_site"),(0,r.kt)("li",{parentName:"ul"},"enhancer"),(0,r.kt)("li",{parentName:"ul"},"open_chromatin_region"),(0,r.kt)("li",{parentName:"ul"},"promoter"),(0,r.kt)("li",{parentName:"ul"},"promoter_flanking_region"),(0,r.kt)("li",{parentName:"ul"},"TF_binding_site")),(0,r.kt)("h4",{id:"regulatory-consequences"},"Regulatory Consequences"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"regulatory_region_variant"),(0,r.kt)("li",{parentName:"ul"},"regulatory_region_ablation"),(0,r.kt)("li",{parentName:"ul"},"regulatory_region_amplification"),(0,r.kt)("li",{parentName:"ul"},"regulatory_region_truncation")),(0,r.kt)("h3",{id:"clinvar"},"ClinVar"),(0,r.kt)(i.default,{mdxType:"ClinVar"}),(0,r.kt)("h3",{id:"1000-genomes"},"1000 Genomes"),(0,r.kt)(b.default,{mdxType:"ThousandGenomesSmall"}),(0,r.kt)("h3",{id:"dann"},"DANN"),(0,r.kt)(u.default,{mdxType:"DANN"}),(0,r.kt)("h3",{id:"dbsnp"},"dbSNP"),(0,r.kt)(m.default,{mdxType:"DbSNP"}),(0,r.kt)("h3",{id:"decipher"},"DECIPHER"),(0,r.kt)(M.default,{mdxType:"DECIPHER"}),(0,r.kt)("h3",{id:"gerp"},"GERP"),(0,r.kt)(g.default,{mdxType:"GERP"}),(0,r.kt)("h3",{id:"gme-variome"},"GME Variome"),(0,r.kt)(D.default,{mdxType:"GME"}),(0,r.kt)("h3",{id:"gnomad"},"gnomAD"),(0,r.kt)(y.default,{mdxType:"GnomadSmall"}),(0,r.kt)("h3",{id:"mitomap"},"MITOMAP"),(0,r.kt)(N.default,{mdxType:"MitoMapSmall"}),(0,r.kt)("h3",{id:"primate-ai"},"Primate AI"),(0,r.kt)(s.default,{mdxType:"PrimateAI"}),(0,r.kt)("h3",{id:"revel"},"REVEL"),(0,r.kt)(c.default,{mdxType:"REVEL"}),(0,r.kt)("h3",{id:"splice-ai"},"Splice AI"),(0,r.kt)(k.default,{mdxType:"SpliceAI"}),(0,r.kt)("h3",{id:"topmed"},"TOPMed"),(0,r.kt)(I.default,{mdxType:"TOPMed"}),(0,r.kt)("h2",{id:"genes"},"Genes"),(0,r.kt)("p",null,"Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant)."),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"genes":[\n {\n "name":"MSH6",\n "hgncId":7329,\n "summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",\n /* this is where gene-level data sources can be found e.g. OMIM */\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"name"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGNC gene symbol")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgncId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"HGNC ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"summary"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"short description of the gene from ",(0,r.kt)("a",{parentName:"td",href:"https://www.omim.org/"},"OMIM"))))),(0,r.kt)("h3",{id:"omim"},"OMIM"),(0,r.kt)(A.default,{mdxType:"Omim"}),(0,r.kt)("h3",{id:"gnomad-lof-gene-metrics"},"gnomAD LoF Gene Metrics"),(0,r.kt)(h.default,{mdxType:"GnomadGeneLof"}),(0,r.kt)("h3",{id:"clingen-disease-validity"},"ClinGen Disease Validity"),(0,r.kt)(d.default,{mdxType:"ClinGenDiseaseValidity"}),(0,r.kt)("h3",{id:"cosmic-cancer-gene-census"},"COSMIC Cancer Gene Census"),(0,r.kt)(T.default,{mdxType:"COSMICCGC"}))}x.isMDXComponent=!0},3431:(t,e,n)=>{n.d(e,{Z:()=>a});const a=n.p+"assets/images/JSON-Layout-fc8e5c0cf4c8428981cd206fe9b6feac.svg"}}]); \ No newline at end of file diff --git a/assets/js/463e69e4.1615fe13.js b/assets/js/463e69e4.1615fe13.js deleted file mode 100644 index e7ad82c7..00000000 --- a/assets/js/463e69e4.1615fe13.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7278],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>m});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var c=a.createContext({}),s=function(e){var t=a.useContext(c),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=s(e.components);return a.createElement(c.Provider,{value:t},e.children)},u="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},h=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,c=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),u=s(n),h=r,m=u["".concat(c,".").concat(h)]||u[h]||d[h]||i;return n?a.createElement(m,o(o({ref:t},p),{},{components:n})):a.createElement(m,o({ref:t},p))}));function m(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=h;var l={};for(var c in t)hasOwnProperty.call(t,c)&&(l[c]=t[c]);l.originalType=e,l[u]="string"==typeof e?e:r,o[1]=l;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>u,frontMatter:()=>i,metadata:()=>l,toc:()=>c});var a=n(7462),r=(n(7294),n(3905));const i={title:"Canonical Transcripts"},o=void 0,l={unversionedId:"core-functionality/canonical-transcripts",id:"core-functionality/canonical-transcripts",title:"Canonical Transcripts",description:"Overview",source:"@site/docs/core-functionality/canonical-transcripts.md",sourceDirName:"core-functionality",slug:"/core-functionality/canonical-transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/canonical-transcripts.md",tags:[],version:"current",frontMatter:{title:"Canonical Transcripts"},sidebar:"docs",previous:{title:"Custom Annotations",permalink:"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations"},next:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving"}},c=[{value:"Overview",id:"overview",children:[],level:2},{value:"Known Algorithms",id:"known-algorithms",children:[{value:"UCSC",id:"ucsc",children:[],level:3},{value:"Ensembl",id:"ensembl",children:[],level:3},{value:"ACMG",id:"acmg",children:[],level:3},{value:"ClinVar",id:"clinvar",children:[],level:3}],level:2},{value:"Unified Approach",id:"unified-approach",children:[],level:2}],s={toc:c},p="wrapper";function u(e){let{components:t,...i}=e;return(0,r.kt)(p,(0,a.Z)({},s,i,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation."),(0,r.kt)("p",null,(0,r.kt)("img",{src:n(3424).Z})),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Golden Helix Blog")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: ",(0,r.kt)("a",{parentName:"p",href:"https://blog.goldenhelix.com/whats-in-a-name-the-intricacies-of-identifying-variants/"},"What\u2019s in a Name: The Intricacies of Identifying Variants"),"."))),(0,r.kt)("p",null,"In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources."),(0,r.kt)("h2",{id:"known-algorithms"},"Known Algorithms"),(0,r.kt)("h3",{id:"ucsc"},"UCSC"),(0,r.kt)("p",null,"UCSC publishes a list of canonical transcripts in its ",(0,r.kt)("inlineCode",{parentName:"p"},"knownCanonical")," table which is available via the ",(0,r.kt)("a",{parentName:"p",href:"https://genome.ucsc.edu/cgi-bin/hgTables"},"TableBrowser"),". Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.")),(0,r.kt)("p",null,"If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule."),(0,r.kt)("h3",{id:"ensembl"},"Ensembl"),(0,r.kt)("p",null,"The ",(0,r.kt)("a",{parentName:"p",href:"http://uswest.ensembl.org/Help/Glossary"},"Ensembl glossary")," states:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:"),(0,r.kt)("ol",{parentName:"blockquote"},(0,r.kt)("li",{parentName:"ol"},"Longest CCDS translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (1), choose the longest Ensembl/Havana merged translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (2), choose the longest translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no translation, choose the longest non-protein-coding transcript."))),(0,r.kt)("h3",{id:"acmg"},"ACMG"),(0,r.kt)("p",null,"From the ACMG Guidelines for the Interpretation of Sequence Variants:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.")),(0,r.kt)("h3",{id:"clinvar"},"ClinVar"),(0,r.kt)("p",null,"From the ClinVar paper:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.")),(0,r.kt)("h2",{id:"unified-approach"},"Unified Approach"),(0,r.kt)("p",null,"Our approach is almost identical to the one Golden Helix discussed in their article:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts."),(0,r.kt)("li",{parentName:"ol"},"Sort the transcripts in the following order:",(0,r.kt)("ol",{parentName:"li"},(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("a",{parentName:"li",href:"https://www.lrg-sequence.org/"},"Locus Reference Genomic (LRG)")," entries occur before non-LRG entries"),(0,r.kt)("li",{parentName:"ol"},"Descending CDS length"),(0,r.kt)("li",{parentName:"ol"},"Descending transcript length"),(0,r.kt)("li",{parentName:"ol"},"Ascending accession number"))),(0,r.kt)("li",{parentName:"ol"},"Grab the first entry")))}u.isMDXComponent=!0},3424:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/hk1-transcripts-a5b85474d3b002553687715dbd004907.png"}}]); \ No newline at end of file diff --git a/assets/js/463e69e4.61efeb17.js b/assets/js/463e69e4.61efeb17.js new file mode 100644 index 00000000..e14eda44 --- /dev/null +++ b/assets/js/463e69e4.61efeb17.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7278],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>m});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},u="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},h=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),u=c(n),h=r,m=u["".concat(s,".").concat(h)]||u[h]||d[h]||i;return n?a.createElement(m,o(o({ref:t},p),{},{components:n})):a.createElement(m,o({ref:t},p))}));function m(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=h;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[u]="string"==typeof e?e:r,o[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>u,frontMatter:()=>i,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={title:"Canonical Transcripts"},o=void 0,l={unversionedId:"core-functionality/canonical-transcripts",id:"core-functionality/canonical-transcripts",title:"Canonical Transcripts",description:"Overview",source:"@site/docs/core-functionality/canonical-transcripts.md",sourceDirName:"core-functionality",slug:"/core-functionality/canonical-transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/canonical-transcripts.md",tags:[],version:"current",frontMatter:{title:"Canonical Transcripts"},sidebar:"docs",previous:{title:"Custom Annotations",permalink:"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations"},next:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Known Algorithms",id:"known-algorithms",children:[{value:"UCSC",id:"ucsc",children:[],level:3},{value:"Ensembl",id:"ensembl",children:[],level:3},{value:"ACMG",id:"acmg",children:[],level:3},{value:"ClinVar",id:"clinvar",children:[],level:3}],level:2},{value:"Unified Approach",id:"unified-approach",children:[],level:2}],c={toc:s},p="wrapper";function u(e){let{components:t,...i}=e;return(0,r.kt)(p,(0,a.Z)({},c,i,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation."),(0,r.kt)("p",null,(0,r.kt)("img",{src:n(3424).Z})),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Golden Helix Blog")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: ",(0,r.kt)("a",{parentName:"p",href:"https://blog.goldenhelix.com/whats-in-a-name-the-intricacies-of-identifying-variants/"},"What\u2019s in a Name: The Intricacies of Identifying Variants"),"."))),(0,r.kt)("p",null,"In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources."),(0,r.kt)("h2",{id:"known-algorithms"},"Known Algorithms"),(0,r.kt)("h3",{id:"ucsc"},"UCSC"),(0,r.kt)("p",null,"UCSC publishes a list of canonical transcripts in its ",(0,r.kt)("inlineCode",{parentName:"p"},"knownCanonical")," table which is available via the ",(0,r.kt)("a",{parentName:"p",href:"https://genome.ucsc.edu/cgi-bin/hgTables"},"TableBrowser"),". Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.")),(0,r.kt)("p",null,"If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule."),(0,r.kt)("h3",{id:"ensembl"},"Ensembl"),(0,r.kt)("p",null,"The ",(0,r.kt)("a",{parentName:"p",href:"http://uswest.ensembl.org/Help/Glossary"},"Ensembl glossary")," states:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:"),(0,r.kt)("ol",{parentName:"blockquote"},(0,r.kt)("li",{parentName:"ol"},"Longest CCDS translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (1), choose the longest Ensembl/Havana merged translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no (2), choose the longest translation with no stop codons."),(0,r.kt)("li",{parentName:"ol"},"If no translation, choose the longest non-protein-coding transcript."))),(0,r.kt)("h3",{id:"acmg"},"ACMG"),(0,r.kt)("p",null,"From the ACMG Guidelines for the Interpretation of Sequence Variants:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.")),(0,r.kt)("h3",{id:"clinvar"},"ClinVar"),(0,r.kt)("p",null,"From the ClinVar paper:"),(0,r.kt)("blockquote",null,(0,r.kt)("p",{parentName:"blockquote"},"When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.")),(0,r.kt)("h2",{id:"unified-approach"},"Unified Approach"),(0,r.kt)("p",null,"Our approach is almost identical to the one Golden Helix discussed in their article:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts."),(0,r.kt)("li",{parentName:"ol"},"Sort the transcripts in the following order:",(0,r.kt)("ol",{parentName:"li"},(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("a",{parentName:"li",href:"https://www.lrg-sequence.org/"},"Locus Reference Genomic (LRG)")," entries occur before non-LRG entries"),(0,r.kt)("li",{parentName:"ol"},"Descending CDS length"),(0,r.kt)("li",{parentName:"ol"},"Descending transcript length"),(0,r.kt)("li",{parentName:"ol"},"Ascending accession number"))),(0,r.kt)("li",{parentName:"ol"},"Grab the first entry")))}u.isMDXComponent=!0},3424:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/hk1-transcripts-a5b85474d3b002553687715dbd004907.png"}}]); \ No newline at end of file diff --git a/assets/js/4b1283a1.3ef53153.js b/assets/js/4b1283a1.3ef53153.js new file mode 100644 index 00000000..165181e3 --- /dev/null +++ b/assets/js/4b1283a1.3ef53153.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9636],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>m});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function l(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",u={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},g=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),d=c(n),g=i,m=d["".concat(s,".").concat(g)]||d[g]||u[g]||r;return n?a.createElement(m,l(l({ref:t},p),{},{components:n})):a.createElement(m,l({ref:t},p))}));function m(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,l=new Array(r);l[0]=g;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[d]="string"==typeof e?e:i,l[1]=o;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>d,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},l=void 0,o={unversionedId:"data-sources/clingen-dosage-json",id:"version-3.24/data-sources/clingen-dosage-json",title:"clingen-dosage-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-dosage-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-dosage-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],c={toc:s},p="wrapper";function d(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clingenDosageSensitivityMap": [{\n "chromosome": "15",\n "begin": 30900686,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 0.33994\n},\n{\n "chromosome": "15",\n "begin": 31727418,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "dosage sensitivity unlikely",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 1\n}]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"Field"),(0,i.kt)("th",{parentName:"tr",align:null},"Type"),(0,i.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"clingenDosageSensitivityMap"),(0,i.kt)("td",{parentName:"tr",align:null},"object array"),(0,i.kt)("td",{parentName:"tr",align:null})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"begin"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"end"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"haploinsufficiency"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"triplosensitivity"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"(same as haploinsufficiency)\xa0")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,i.kt)("td",{parentName:"tr",align:null},"floating point"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,i.kt)("td",{parentName:"tr",align:null},"floating point"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"haploinsufficiency and triplosensitivity")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"no evidence to suggest that dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"little evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"emerging evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"gene associated with autosomal recessive phenotype"),(0,i.kt)("li",{parentName:"ul"},"dosage sensitivity unlikely")))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/4b2748e1.b6526a03.js b/assets/js/4b2748e1.b6526a03.js new file mode 100644 index 00000000..e9f5c8f9 --- /dev/null +++ b/assets/js/4b2748e1.b6526a03.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[775],{3905:(t,e,r)=>{r.d(e,{Zo:()=>m,kt:()=>f});var n=r(7294);function a(t,e,r){return e in t?Object.defineProperty(t,e,{value:r,enumerable:!0,configurable:!0,writable:!0}):t[e]=r,t}function o(t,e){var r=Object.keys(t);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(t);e&&(n=n.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),r.push.apply(r,n)}return r}function i(t){for(var e=1;e=0||(a[r]=t[r]);return a}(t,e);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(t);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(t,r)&&(a[r]=t[r])}return a}var p=n.createContext({}),c=function(t){var e=n.useContext(p),r=e;return t&&(r="function"==typeof t?t(e):i(i({},e),t)),r},m=function(t){var e=c(t.components);return n.createElement(p.Provider,{value:e},t.children)},s="mdxType",u={inlineCode:"code",wrapper:function(t){var e=t.children;return n.createElement(n.Fragment,{},e)}},d=n.forwardRef((function(t,e){var r=t.components,a=t.mdxType,o=t.originalType,p=t.parentName,m=l(t,["components","mdxType","originalType","parentName"]),s=c(r),d=a,f=s["".concat(p,".").concat(d)]||s[d]||u[d]||o;return r?n.createElement(f,i(i({ref:e},m),{},{components:r})):n.createElement(f,i({ref:e},m))}));function f(t,e){var r=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var o=r.length,i=new Array(o);i[0]=d;var l={};for(var p in e)hasOwnProperty.call(e,p)&&(l[p]=e[p]);l.originalType=t,l[s]="string"==typeof t?t:a,i[1]=l;for(var c=2;c{r.r(e),r.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>o,metadata:()=>l,toc:()=>p});var n=r(7462),a=(r(7294),r(3905));const o={},i=void 0,l={unversionedId:"data-sources/mitomap-structural-variants-json",id:"version-3.24/data-sources/mitomap-structural-variants-json",title:"mitomap-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],c={toc:p},m="wrapper";function s(t){let{components:e,...r}=t;return(0,a.kt)(m,(0,n.Z)({},c,r,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "chromosome":"MT",\n "begin":3166,\n "end":14152,\n "variantType":"deletion",\n "reciprocalOverlap":0.18068,\n "annotationOverlap":0.42405\n }\n]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,a.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"end"),(0,a.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"reciprocalOverlap"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"annotationOverlap"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/4b69b274.89bfa8ee.js b/assets/js/4b69b274.89bfa8ee.js deleted file mode 100644 index 2aee6f54..00000000 --- a/assets/js/4b69b274.89bfa8ee.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7106],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>f});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var l=a.createContext({}),s=function(e){var t=a.useContext(l),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},u=function(e){var t=s(e.components);return a.createElement(l.Provider,{value:t},e.children)},p="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,l=e.parentName,u=c(e,["components","mdxType","originalType","parentName"]),p=s(n),m=i,f=p["".concat(l,".").concat(m)]||p[m]||d[m]||r;return n?a.createElement(f,o(o({ref:t},u),{},{components:n})):a.createElement(f,o({ref:t},u))}));function f(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=m;var c={};for(var l in t)hasOwnProperty.call(t,l)&&(c[l]=t[l]);c.originalType=e,c[p]="string"==typeof e?e:i,o[1]=c;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>p,frontMatter:()=>r,metadata:()=>c,toc:()=>l});var a=n(7462),i=(n(7294),n(3905));const r={title:"Junction Preserving Annotation"},o=void 0,c={unversionedId:"core-functionality/junction-preserving",id:"core-functionality/junction-preserving",title:"Junction Preserving Annotation",description:"Background",source:"@site/docs/core-functionality/junction-preserving.md",sourceDirName:"core-functionality",slug:"/core-functionality/junction-preserving",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/junction-preserving.md",tags:[],version:"current",frontMatter:{title:"Junction Preserving Annotation"},sidebar:"docs",previous:{title:"Canonical Transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts"},next:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts"}},l=[{value:"Background",id:"background",children:[],level:2},{value:"Implementation",id:"implementation",children:[],level:2}],s={toc:l},u="wrapper";function p(e){let{components:t,...r}=e;return(0,i.kt)(u,(0,a.Z)({},s,r,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"background"},"Background"),(0,i.kt)("p",null,"When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant. "),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"Deletion at exon boundary",src:n(856).Z})),(0,i.kt)("p",null,"Note that: "),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"When right-aligned the variant starts at the first base of the exon (as pictured)."),(0,i.kt)("li",{parentName:"ul"},"When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.")),(0,i.kt)("p",null,"From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an ",(0,i.kt)("inlineCode",{parentName:"p"},"inframe_deletion")," not a ",(0,i.kt)("inlineCode",{parentName:"p"},"splice_acceptor_variant")," as splice acceptor sequence ",(0,i.kt)("inlineCode",{parentName:"p"},"AG")," is unaffected."),(0,i.kt)("p",null,"When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction."),(0,i.kt)("h2",{id:"implementation"},"Implementation"),(0,i.kt)("p",null,"By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported."),(0,i.kt)("div",{className:"admonition admonition-note alert alert--secondary"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"}))),"note")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant."))))}p.isMDXComponent=!0},856:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/27-nt-deletion-c733fc75acdc0ef64ac7d181d5ff81fe.png"}}]); \ No newline at end of file diff --git a/assets/js/4b69b274.dfa70220.js b/assets/js/4b69b274.dfa70220.js new file mode 100644 index 00000000..06fa8c1c --- /dev/null +++ b/assets/js/4b69b274.dfa70220.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7106],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>f});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function r(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var l=a.createContext({}),s=function(e){var t=a.useContext(l),n=t;return e&&(n="function"==typeof e?e(t):r(r({},t),e)),n},u=function(e){var t=s(e.components);return a.createElement(l.Provider,{value:t},e.children)},p="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,o=e.originalType,l=e.parentName,u=c(e,["components","mdxType","originalType","parentName"]),p=s(n),m=i,f=p["".concat(l,".").concat(m)]||p[m]||d[m]||o;return n?a.createElement(f,r(r({ref:t},u),{},{components:n})):a.createElement(f,r({ref:t},u))}));function f(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var o=n.length,r=new Array(o);r[0]=m;var c={};for(var l in t)hasOwnProperty.call(t,l)&&(c[l]=t[l]);c.originalType=e,c[p]="string"==typeof e?e:i,r[1]=c;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>p,frontMatter:()=>o,metadata:()=>c,toc:()=>l});var a=n(7462),i=(n(7294),n(3905));const o={title:"Junction Preserving Annotation"},r=void 0,c={unversionedId:"core-functionality/junction-preserving",id:"core-functionality/junction-preserving",title:"Junction Preserving Annotation",description:"Background",source:"@site/docs/core-functionality/junction-preserving.md",sourceDirName:"core-functionality",slug:"/core-functionality/junction-preserving",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/junction-preserving.md",tags:[],version:"current",frontMatter:{title:"Junction Preserving Annotation"},sidebar:"docs",previous:{title:"ISCN Notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/iscn-notation"},next:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts"}},l=[{value:"Background",id:"background",children:[],level:2},{value:"Implementation",id:"implementation",children:[],level:2}],s={toc:l},u="wrapper";function p(e){let{components:t,...o}=e;return(0,i.kt)(u,(0,a.Z)({},s,o,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"background"},"Background"),(0,i.kt)("p",null,"When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant. "),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"Deletion at exon boundary",src:n(856).Z})),(0,i.kt)("p",null,"Note that: "),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"When right-aligned the variant starts at the first base of the exon (as pictured)."),(0,i.kt)("li",{parentName:"ul"},"When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.")),(0,i.kt)("p",null,"From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an ",(0,i.kt)("inlineCode",{parentName:"p"},"inframe_deletion")," not a ",(0,i.kt)("inlineCode",{parentName:"p"},"splice_acceptor_variant")," as splice acceptor sequence ",(0,i.kt)("inlineCode",{parentName:"p"},"AG")," is unaffected."),(0,i.kt)("p",null,"When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction."),(0,i.kt)("h2",{id:"implementation"},"Implementation"),(0,i.kt)("p",null,"By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported."),(0,i.kt)("div",{className:"admonition admonition-note alert alert--secondary"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"}))),"note")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant."))))}p.isMDXComponent=!0},856:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/27-nt-deletion-c733fc75acdc0ef64ac7d181d5ff81fe.png"}}]); \ No newline at end of file diff --git a/assets/js/4f2cf309.32706599.js b/assets/js/4f2cf309.32706599.js new file mode 100644 index 00000000..ec7ce527 --- /dev/null +++ b/assets/js/4f2cf309.32706599.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7491],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function o(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var c=r.createContext({}),u=function(e){var t=r.useContext(c),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=u(e.components);return r.createElement(c.Provider,{value:t},e.children)},s="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},d=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,l=e.originalType,c=e.parentName,p=i(e,["components","mdxType","originalType","parentName"]),s=u(n),d=a,f=s["".concat(c,".").concat(d)]||s[d]||m[d]||l;return n?r.createElement(f,o(o({ref:t},p),{},{components:n})):r.createElement(f,o({ref:t},p))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var l=n.length,o=new Array(l);o[0]=d;var i={};for(var c in t)hasOwnProperty.call(t,c)&&(i[c]=t[c]);i.originalType=e,i[s]="string"==typeof e?e:a,o[1]=i;for(var u=2;u{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>s,frontMatter:()=>l,metadata:()=>i,toc:()=>c});var r=n(7462),a=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/gme-json",id:"version-3.24/data-sources/gme-json",title:"gme-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gme-json.md",sourceDirName:"data-sources",slug:"/data-sources/gme-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gme-json.md",tags:[],version:"3.24",frontMatter:{}},c=[],u={toc:c},p="wrapper";function s(e){let{components:t,...n}=e;return(0,a.kt)(p,(0,r.Z)({},u,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"gmeVariome":{\n "allAc":10,\n "allAn":202,\n "allAf":0.049504,\n "failedFilter":true\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAc"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"GME allele count")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAn"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"GME allele number")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAf"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"GME allele frequency")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,a.kt)("td",{parentName:"tr",align:null},"bool"),(0,a.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/51620725.673ad6b6.js b/assets/js/51620725.673ad6b6.js new file mode 100644 index 00000000..222d4e47 --- /dev/null +++ b/assets/js/51620725.673ad6b6.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7946],{3905:(e,t,n)=>{n.d(t,{Zo:()=>d,kt:()=>g});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function l(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},d=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},p="mdxType",u={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,d=o(e,["components","mdxType","originalType","parentName"]),p=c(n),m=r,g=p["".concat(s,".").concat(m)]||p[m]||u[m]||i;return n?a.createElement(g,l(l({ref:t},d),{},{components:n})):a.createElement(g,l({ref:t},d))}));function g(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,l=new Array(i);l[0]=m;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[p]="string"==typeof e?e:r,l[1]=o;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>p,frontMatter:()=>i,metadata:()=>o,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={},l=void 0,o={unversionedId:"data-sources/clingen-gene-validity-json",id:"version-3.24/data-sources/clingen-gene-validity-json",title:"clingen-gene-validity-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-gene-validity-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-gene-validity-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],c={toc:s},d="wrapper";function p(e){let{components:t,...n}=e;return(0,r.kt)(d,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clingenGeneValidity":[\n {\n "diseaseId":"MONDO_0007893",\n "disease":"Noonan syndrome with multiple lentigines",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n },\n {\n "diseaseId":"MONDO_0015280",\n "disease":"cardiofaciocutaneous syndrome",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clingenGeneValidity"),(0,r.kt)("td",{parentName:"tr",align:null},"object"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"diseaseId"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Monarch Disease Ontology ID (MONDO)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"disease"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"disease label")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"classification"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see below for possible values")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"classificationDate"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"yyyy-MM-dd")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"classification")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"no reported evidence"),(0,r.kt)("li",{parentName:"ul"},"disputed"),(0,r.kt)("li",{parentName:"ul"},"limited"),(0,r.kt)("li",{parentName:"ul"},"moderate"),(0,r.kt)("li",{parentName:"ul"},"definitive"),(0,r.kt)("li",{parentName:"ul"},"strong"),(0,r.kt)("li",{parentName:"ul"},"refuted"),(0,r.kt)("li",{parentName:"ul"},"no known disease relationship")))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/51833f15.28e8a08b.js b/assets/js/51833f15.28e8a08b.js new file mode 100644 index 00000000..9a90e4f0 --- /dev/null +++ b/assets/js/51833f15.28e8a08b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5278,6573,417],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>h});var a=n(7294);function o(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(o[n]=e[n]);return o}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(o[n]=e[n])}return o}var s=a.createContext({}),p=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},c=function(e){var t=p(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,o=e.mdxType,r=e.originalType,s=e.parentName,c=l(e,["components","mdxType","originalType","parentName"]),d=p(n),u=o,h=d["".concat(s,".").concat(u)]||d[u]||m[u]||r;return n?a.createElement(h,i(i({ref:t},c),{},{components:n})):a.createElement(h,i({ref:t},c))}));function h(e,t){var n=arguments,o=t&&t.mdxType;if("string"==typeof e||o){var r=n.length,i=new Array(r);i[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:o,i[1]=l;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>d,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),o=(n(7294),n(3905));const r={},i=void 0,l={unversionedId:"data-sources/phylop-json",id:"version-3.24/data-sources/phylop-json",title:"phylop-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/phylop-json.md",sourceDirName:"data-sources",slug:"/data-sources/phylop-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/phylop-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,o.kt)(c,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-json",metastring:"{10}","{10}":!0},'"variants":[\n {\n "vid":"2:48010488:A",\n "chromosome":"chr2",\n "begin":48010488,\n "end":48010488,\n "refAllele":"G",\n "altAllele":"A",\n "variantType":"SNV",\n "phylopScore":0.459\n }\n] \n')),(0,o.kt)("table",null,(0,o.kt)("thead",{parentName:"table"},(0,o.kt)("tr",{parentName:"thead"},(0,o.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,o.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,o.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,o.kt)("tbody",{parentName:"table"},(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:"left"},"phylopScore"),(0,o.kt)("td",{parentName:"tr",align:"center"},"float"),(0,o.kt)("td",{parentName:"tr",align:"left"},"range: -14.08 to 6.424")))))}d.isMDXComponent=!0},7204:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>s,default:()=>u,frontMatter:()=>l,metadata:()=>p,toc:()=>c});var a=n(7462),o=(n(7294),n(3905)),r=n(5354),i=n(2415);const l={title:"PhyloP"},s=void 0,p={unversionedId:"data-sources/phylop",id:"version-3.24/data-sources/phylop",title:"PhyloP",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/phylop.mdx",sourceDirName:"data-sources",slug:"/data-sources/phylop",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/phylop.mdx",tags:[],version:"3.24",frontMatter:{title:"PhyloP"},sidebar:"docs",previous:{title:"OMIM",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim"},next:{title:"Primate AI-3D",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai"}},c=[{value:"Overview",id:"overview",children:[{value:"PhyloP Primate",id:"phylop-primate",children:[{value:"BigWig File",id:"bigwig-file",children:[],level:4},{value:"JSON Output",id:"json-output",children:[],level:4}],level:3},{value:"PhyloP",id:"phylop",children:[{value:"WigFix File",id:"wigfix-file",children:[],level:4},{value:"Download URL",id:"download-url",children:[],level:4}],level:3},{value:"JSON Output",id:"json-output-1",children:[],level:3}],level:2}],d={toc:c},m="wrapper";function u(e){let{components:t,...n}=e;return(0,o.kt)(m,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,o.kt)("h2",{id:"overview"},"Overview"),(0,o.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,o.kt)("div",{parentName:"div",className:"admonition-heading"},(0,o.kt)("h5",{parentName:"div"},(0,o.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,o.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,o.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,o.kt)("div",{parentName:"div",className:"admonition-content"},(0,o.kt)("p",{parentName:"div"},"Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. ",(0,o.kt)("strong",{parentName:"p"},"Nature 2023"),". (",(0,o.kt)("a",{parentName:"p",href:"https://doi.org/10.1038/s41586-023-06798-8"},"https://doi.org/10.1038/s41586-023-06798-8"),")"),(0,o.kt)("p",{parentName:"div"},"Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. ",(0,o.kt)("strong",{parentName:"p"},"Genome Res. 2005")," Aug;15(8):1034-50. (",(0,o.kt)("a",{parentName:"p",href:"http://www.genome.org/cgi/doi/10.1101/gr.3715005"},"http://www.genome.org/cgi/doi/10.1101/gr.3715005"),")"))),(0,o.kt)("h3",{id:"phylop-primate"},"PhyloP Primate"),(0,o.kt)("p",null,"PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates.\nIt enriches that with human genetic variants, these elements influence gene expression and impact complex traits and diseases."),(0,o.kt)("p",null,"PhyloP Primate is only available for GRCh38 assembly."),(0,o.kt)("h4",{id:"bigwig-file"},"BigWig File"),(0,o.kt)("p",null,"The original file is ",(0,o.kt)("inlineCode",{parentName:"p"},"primates_msa.phylop.conacc.lrt.bw")," which is a bigwig file. This file was converted to wig file using:\n(",(0,o.kt)("a",{parentName:"p",href:"https://genome.ucsc.edu/goldenPath/help/bigWig.html"},"https://genome.ucsc.edu/goldenPath/help/bigWig.html"),")\nAfter conversion the wig file provides the scores in the following format:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},"0.14\n0.074\n-2.487\n0.073\n0.052\n0.073\nfixedStep chrom=chr1 start=10558 step=1 span=1\n-1.991\n0.052\n-2.047\n0.052\n0.052\n0.074\n-1.992\n0.074\n0.052\n0.073\n0.074\n0.052\n0.074\n-2.05\n-2.059\n0.074\n0.074\n0.074\n")),(0,o.kt)("h4",{id:"json-output"},"JSON Output"),(0,o.kt)("p",null,"Unlike other supplemetary datasources, phyloP scores are reported in the variants section."),(0,o.kt)(i.default,{mdxType:"PHYLOPPRIMATEJSON"}),(0,o.kt)("h3",{id:"phylop"},"PhyloP"),(0,o.kt)("p",null,"PhyloP (phylogenetic p-values) conservation scores are obtained from the ","[PHAST package]"," (",(0,o.kt)("a",{parentName:"p",href:"http://compgen.bscb.cornell.edu/phast/"},"http://compgen.bscb.cornell.edu/phast/"),") for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes."),(0,o.kt)("h4",{id:"wigfix-file"},"WigFix File"),(0,o.kt)("p",null,"The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},"fixedStep chrom=chr1 start=10918 step=1\n0.064\n0.058\n0.064\n0.058\n0.064\n0.064\nfixedStep chrom=chr1 start=34045 step=1\n0.111\n0.100\n0.111\n0.111\n0.100\n0.111\n0.111\n0.111\n0.100\n0.111\n-1.636\n")),(0,o.kt)("p",null,"We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs."),(0,o.kt)("h4",{id:"download-url"},"Download URL"),(0,o.kt)("p",null,"GRCh37: ",(0,o.kt)("a",{parentName:"p",href:"http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/"},"http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/")),(0,o.kt)("p",null,"GRCh38: ",(0,o.kt)("a",{parentName:"p",href:"http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/"},"http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/")),(0,o.kt)("h3",{id:"json-output-1"},"JSON Output"),(0,o.kt)("p",null,"Unlike other supplemetary datasources, phyloP scores are reported in the variants section."),(0,o.kt)(r.default,{mdxType:"PHYLOPJSON"}))}u.isMDXComponent=!0},2415:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>d,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),o=(n(7294),n(3905));const r={},i=void 0,l={unversionedId:"data-sources/phylopprimate-json",id:"version-3.24/data-sources/phylopprimate-json",title:"phylopprimate-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/phylopprimate-json.md",sourceDirName:"data-sources",slug:"/data-sources/phylopprimate-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylopprimate-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/phylopprimate-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,o.kt)(c,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-json",metastring:"{11}","{11}":!0},' "variants": [\n {\n "vid": "1-64927-G-T",\n "chromosome": "chr1",\n "begin": 64927,\n "end": 64927,\n "refAllele": "G",\n "altAllele": "T",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.64927G>T",\n "phyloPPrimateScore": 0.151\n }\n]\n')),(0,o.kt)("table",null,(0,o.kt)("thead",{parentName:"table"},(0,o.kt)("tr",{parentName:"thead"},(0,o.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,o.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,o.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,o.kt)("tbody",{parentName:"table"},(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:"left"},"phyloPPrimateScore"),(0,o.kt)("td",{parentName:"tr",align:"center"},"float"),(0,o.kt)("td",{parentName:"tr",align:"left"},"range: -20 to 1.951")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/51bfff8d.5473d8f7.js b/assets/js/51bfff8d.5473d8f7.js new file mode 100644 index 00000000..6a3e1541 --- /dev/null +++ b/assets/js/51bfff8d.5473d8f7.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[8308],{3905:(n,e,t)=>{t.d(e,{Zo:()=>u,kt:()=>d});var r=t(7294);function a(n,e,t){return e in n?Object.defineProperty(n,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):n[e]=t,n}function o(n,e){var t=Object.keys(n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(n);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(n,e).enumerable}))),t.push.apply(t,r)}return t}function c(n){for(var e=1;e=0||(a[t]=n[t]);return a}(n,e);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(n);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(n,t)&&(a[t]=n[t])}return a}var i=r.createContext({}),l=function(n){var e=r.useContext(i),t=e;return n&&(t="function"==typeof n?n(e):c(c({},e),n)),t},u=function(n){var e=l(n.components);return r.createElement(i.Provider,{value:e},n.children)},m="mdxType",f={inlineCode:"code",wrapper:function(n){var e=n.children;return r.createElement(r.Fragment,{},e)}},p=r.forwardRef((function(n,e){var t=n.components,a=n.mdxType,o=n.originalType,i=n.parentName,u=s(n,["components","mdxType","originalType","parentName"]),m=l(t),p=a,d=m["".concat(i,".").concat(p)]||m[p]||f[p]||o;return t?r.createElement(d,c(c({ref:e},u),{},{components:t})):r.createElement(d,c({ref:e},u))}));function d(n,e){var t=arguments,a=e&&e.mdxType;if("string"==typeof n||a){var o=t.length,c=new Array(o);c[0]=p;var s={};for(var i in e)hasOwnProperty.call(e,i)&&(s[i]=e[i]);s.originalType=n,s[m]="string"==typeof n?n:a,c[1]=s;for(var l=2;l{t.r(e),t.d(e,{contentTitle:()=>c,default:()=>m,frontMatter:()=>o,metadata:()=>s,toc:()=>i});var r=t(7462),a=(t(7294),t(3905));const o={},c=void 0,s={unversionedId:"data-sources/gnomad40-structural-variants-json",id:"version-3.24/data-sources/gnomad40-structural-variants-json",title:"gnomad40-structural-variants-json",description:"",source:"@site/versioned_docs/version-3.24/data-sources/gnomad40-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad40-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad40-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad40-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],l={toc:i},u="wrapper";function m(n){let{components:e,...t}=n;return(0,a.kt)(u,(0,r.Z)({},l,t,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad": [\n {\n "chromosome": "1",\n "begin": 1769047,\n "end": 78686496,\n "variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",\n "variantType": "complex_structural_alteration",\n "failedFilter": true,\n "allAf": 0.51192,\n "afrAf": 0.491986,\n "amiAf": 0.559382,\n "amrAf": 0.499444,\n "asjAf": 0.505975,\n "easAf": 0.51924,\n "midAf": 0.53125,\n "finAf": 0.542619,\n "nfeAf": 0.521916,\n "othAf": 0.492366,\n "sasAf": 0.516568,\n "femaleAf": 0.509225,\n "maleAf": 0.514861,\n "allAc": 64549,\n "afrAc": 16637,\n "amiAc": 471,\n "amrAc": 6290,\n "asjAc": 1609,\n "easAc": 2105,\n "midAc": 34,\n "finAc": 3514,\n "nfeAc": 30839,\n "othAc": 774,\n "sasAc": 2276,\n "femaleAc": 33507,\n "maleAc": 31042,\n "allAn": 126092,\n "afrAn": 33816,\n "amiAn": 842,\n "amrAn": 12594,\n "asjAn": 3180,\n "easAn": 4054,\n "midAn": 64,\n "finAn": 6476,\n "nfeAn": 59088,\n "othAn": 1572,\n "sasAn": 4406,\n "femaleAn": 65800,\n "maleAn": 60292,\n "allHc": 3167,\n "afrHc": 413,\n "amiHc": 54,\n "amrHc": 238,\n "asjHc": 49,\n "easHc": 97,\n "midHc": 2,\n "finHc": 368,\n "nfeHc": 1807,\n "othHc": 23,\n "sasHc": 116,\n "femaleHc": 1407,\n "maleHc": 1760\n }\n]\n')))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/51f80da3.b8258f22.js b/assets/js/51f80da3.b8258f22.js new file mode 100644 index 00000000..93b24ff2 --- /dev/null +++ b/assets/js/51f80da3.b8258f22.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3002],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>h});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),d=c(n),u=i,h=d["".concat(s,".").concat(u)]||d[u]||m[u]||r;return n?a.createElement(h,o(o({ref:t},p),{},{components:n})):a.createElement(h,o({ref:t},p))}));function h(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:i,o[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={title:"Gene Fusion Detection"},o=void 0,l={unversionedId:"core-functionality/gene-fusions",id:"version-3.24/core-functionality/gene-fusions",title:"Gene Fusion Detection",description:"Overview",source:"@site/versioned_docs/version-3.24/core-functionality/gene-fusions.md",sourceDirName:"core-functionality",slug:"/core-functionality/gene-fusions",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/gene-fusions",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/gene-fusions.md",tags:[],version:"3.24",frontMatter:{title:"Gene Fusion Detection"},sidebar:"docs",previous:{title:"Canonical Transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/canonical-transcripts"},next:{title:"ISCN Notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/iscn-notation"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Approach",id:"approach",children:[{value:"Variant Types",id:"variant-types",children:[],level:3},{value:"Criteria",id:"criteria",children:[],level:3}],level:2},{value:"ETV6/RUNX1 Example",id:"etv6runx1-example",children:[{value:"VCF",id:"vcf",children:[],level:3},{value:"JSON Output",id:"json-output",children:[{value:"Gene Fusion Data Sources",id:"gene-fusion-data-sources",children:[],level:4},{value:"Consequences",id:"consequences",children:[],level:4},{value:"Gene Fusions Section",id:"gene-fusions-section",children:[],level:4}],level:3}],level:2}],c={toc:s},p="wrapper";function d(e){let{components:t,...r}=e;return(0,i.kt)(p,(0,a.Z)({},c,r,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed."),(0,i.kt)("p",null,"Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations."),(0,i.kt)("p",null,"The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(3397).Z})),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. ",(0,i.kt)("a",{parentName:"p",href:"https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-015-0252-1"},"Landscape of gene fusions in epithelial cancers: seq and ye shall find"),". Genome Med 7, 129 (2015)"))),(0,i.kt)("h2",{id:"approach"},"Approach"),(0,i.kt)("p",null,"Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_014206.3")," (",(0,i.kt)("strong",{parentName:"p"},"TMEM258"),") and ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_013402.4")," (",(0,i.kt)("strong",{parentName:"p"},"FADS1"),"). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 transcripts",src:n(347).Z})),(0,i.kt)("p",null,"The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 gene fusions",src:n(4554).Z})),(0,i.kt)("p",null,"Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion.\nIf only unidirectional gene fusions are desired, only these two fusions can be detected. If ",(0,i.kt)("inlineCode",{parentName:"p"},"enable-bidirectional-fusions")," is enabled, all four cases can be identified."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Interpreting translocation breakends")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the ",(0,i.kt)("a",{parentName:"p",href:"https://samtools.github.io/hts-specs/VCFv4.2.pdf"},"VCF 4.2 specification"),"."),(0,i.kt)("table",{parentName:"div"},(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"REF"),(0,i.kt)("th",{parentName:"tr",align:"left"},"ALT"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Meaning"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t[p["),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the right of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t]p]"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending left of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"]p]t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the left of p is joined before t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"[p[t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending right of p is joined before t")))))),(0,i.kt)("h3",{id:"variant-types"},"Variant Types"),(0,i.kt)("p",null,"Specifically we can identify gene fusions from the following structural variant types:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"deletions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"tandem_duplications (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"inversions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"translocation breakpoints (",(0,i.kt)("inlineCode",{parentName:"li"},"AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911["),") ")),(0,i.kt)("h3",{id:"criteria"},"Criteria"),(0,i.kt)("p",null,"The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},"After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is not enabled. They can have the same or different orientations if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is set."),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must belong to different genes"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)")),(0,i.kt)("h2",{id:"etv6runx1-example"},"ETV6/RUNX1 Example"),(0,i.kt)("p",null,"ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Sun C., Chang L., Zhu X. ",(0,i.kt)("a",{parentName:"p",href:"https://www.oncotarget.com/article/16367/text/"},"Pathogenesis of ETV6/RUNX1-positive childhood acute lymphoblastic leukemia and mechanisms underlying its relapse"),". Oncotarget. 2017; 8: 35445-35459"))),(0,i.kt)("h3",{id:"vcf"},"VCF"),(0,i.kt)("p",null,"Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\nchr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND\nchr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND\nchr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND\nchr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND\n")),(0,i.kt)("p",null,"When you put these calls together, the resulting genomic rearrangement looks something like this:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(7852).Z})),(0,i.kt)("h3",{id:"json-output"},"JSON Output"),(0,i.kt)("p",null,"The annotation for the first variant in the VCF looks like this:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{36-58}","{36-58}":!0},'{"positions":[\n{\n "chromosome": "12",\n "position": 12026270,\n "refAllele": "C",\n "altAlleles": [\n "[chr21:36420865[C"\n ],\n "filters": [\n "PASS"\n ],\n "cytogeneticBand": "12p13.2",\n "variants": [\n {\n "vid": "12-12026270-C-[chr21:36420865[C",\n "chromosome": "12",\n "begin": 12026270,\n "end": 12026270,\n "isStructuralVariant": true,\n "refAllele": "C",\n "altAllele": "[chr21:36420865[C",\n "variantType": "translocation",\n "transcripts": [\n {\n "transcript": "ENST00000396373.4",\n "source": "Ensembl",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "ENSG00000139083",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "ENST00000437180.1",\n "bioType": "mRNA",\n "source": "Ensembl",\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000409227.1",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n },\n {\n "transcript": "ENST00000300305.3",\n "bioType": "mRNA",\n "source": "Ensembl",\n "isCanonical": true,\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000300305.3",\n "intron": 1,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "ENSP00000379658.3"\n },\n {\n "transcript": "NM_001987.5",\n "source": "RefSeq",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "2120",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "NM_001754.5",\n "bioType": "mRNA",\n "source": "RefSeq",\n "isCanonical": true,\n "geneId": "861",\n "proteinId": "NP_001745.2",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "NP_001978.1"\n }\n ]\n }\n ]\n}\n]}\n\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"bioType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"descriptions of the ",(0,i.kt)("a",{parentName:"td",href:"https://uswest.ensembl.org/info/genome/genebuild/biotypes.html"},"biotypes from Ensembl"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"exon"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"exon that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"intron"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"intron that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"geneId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene ID. e.g. ENSG00000116062")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgvsr"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"HGVS RNA nomenclature")))),(0,i.kt)("h4",{id:"gene-fusion-data-sources"},"Gene Fusion Data Sources"),(0,i.kt)("p",null,"To provide more context to our gene fusions, we provide the following gene fusion data sources:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/cosmic"},"COSMIC")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/fusioncatcher"},"FusionCatcher"))),(0,i.kt)("h4",{id:"consequences"},"Consequences"),(0,i.kt)("p",null,"When a gene fusion is identified, we add the following Sequence Ontology consequence:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{3}","{3}":!0},' "consequence": [\n "transcript_variant",\n "gene_fusion"\n ],\n')),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"If both transcripts have the same orientation, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"unidirectional_gene_fusion"),", if they have different orientations, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"bidirectional_gene_fusion")),(0,i.kt)("li",{parentName:"ul"},"If both unidirectional and bidirectional ones are detected, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"gene_fusion"),".")),(0,i.kt)("h4",{id:"gene-fusions-section"},"Gene Fusions Section"),(0,i.kt)("p",null,"The ",(0,i.kt)("inlineCode",{parentName:"p"},"geneFusions")," section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ",(0,i.kt)("inlineCode",{parentName:"p"},"ENST00000396373.4"),", there 7 other Ensembl transcripts that would produce a gene fusion. For ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4"),", there was only one transcript (",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4"),") that produce a gene fusion."),(0,i.kt)("p",null,"For each originating transcript, we report the following for each partner transcript:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"transcript ID"),(0,i.kt)("li",{parentName:"ul"},"gene ID"),(0,i.kt)("li",{parentName:"ul"},"HGNC gene symbol"),(0,i.kt)("li",{parentName:"ul"},"transcript bio type (e.g. protein_coding)"),(0,i.kt)("li",{parentName:"ul"},"intron or exon number containing the breakpoint"),(0,i.kt)("li",{parentName:"ul"},"HGVS RNA notation"),(0,i.kt)("li",{parentName:"ul"},"gene fusion directionality")),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see ",(0,i.kt)("a",{parentName:"p",href:"https://varnomen.hgvs.org/bg-material/consultation/svd-wg007"},"HGVS SVD-WG007"),")."))),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{8}","{8}":!0},' "geneFusions": [\n {\n "transcript": "NM_001754.4",\n "bioType": "protein_coding",\n "intron": 2,\n "geneId": "861",\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",\n "directionality":"uniDirectional"\n }\n ],\n')),(0,i.kt)("p",null,"The HGVS RNA notation above indicates that the gene fusion starts with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4")," (RUNX1) until CDS position 58 and continues with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4")," (ETV6). ",(0,i.kt)("inlineCode",{parentName:"p"},"1009+3367")," indicates that the fusion occurred 3367 bp within intron 2."))}d.isMDXComponent=!0},4554:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_GeneFusions-e5e3758ea9d2c07d3591e3801b2bf7e3.svg"},347:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_Transcripts-fe1b9c6be1f7cbfefbce887f8cec5d58.svg"},7852:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/etv6-runx1-fusion-ec8f4312c9aca496bde0d6e2b1bbd50d.svg"},3397:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/gene-fusions-fig2-1cce8ac31b00465c8d36bdc47ec3309e.svg"}}]); \ No newline at end of file diff --git a/assets/js/53062ee1.3f65d126.js b/assets/js/53062ee1.3f65d126.js new file mode 100644 index 00000000..006ef2db --- /dev/null +++ b/assets/js/53062ee1.3f65d126.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1090],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>N});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function o(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),u=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},m=function(t){var e=u(t.components);return a.createElement(p.Provider,{value:e},t.children)},d="mdxType",g={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},k=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,m=i(t,["components","mdxType","originalType","parentName"]),d=u(n),k=r,N=d["".concat(p,".").concat(k)]||d[k]||g[k]||l;return n?a.createElement(N,o(o({ref:e},m),{},{components:n})):a.createElement(N,o({ref:e},m))}));function N(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,o=new Array(l);o[0]=k;var i={};for(var p in e)hasOwnProperty.call(e,p)&&(i[p]=e[p]);i.originalType=t,i[d]="string"==typeof t?t:r,o[1]=i;for(var u=2;u{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>d,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/gnomad-structural-variants-json",id:"version-3.24/data-sources/gnomad-structural-variants-json",title:"gnomad-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},m="wrapper";function d(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},u,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD-preview": [\n {\n "chromosome": "1",\n "begin": 40001,\n "end": 47200,\n "variantId": "gnomAD-SV_v2.1_DUP_1_1",\n "variantType": "duplication",\n "failedFilter": true,\n "allAf": 0.068963,\n "afrAf": 0.135694,\n "amrAf": 0.022876,\n "easAf": 0.01101,\n "eurAf": 0.007846,\n "othAf": 0.017544,\n "femaleAf": 0.065288,\n "maleAf": 0.07255,\n "allAc": 943,\n "afrAc": 866,\n "amrAc": 21,\n "easAc": 17,\n "eurAc": 37,\n "othAc": 2,\n "femaleAc": 442,\n "maleAc": 499,\n "allAn": 13674,\n "afrAn": 6382,\n "amrAn": 918,\n "easAn": 1544,\n "eurAn": 4716,\n "othAn": 114,\n "femaleAn": 6770,\n "maleAn": 6878,\n "allHc": 91,\n "afrHc": 90,\n "amrHc": 1,\n "easHc": 0,\n "eurHc": 0,\n "othHc": 55,\n "femaleHc": 44,\n "maleHc": 47,\n "reciprocalOverlap": 0.01839,\n "annotationOverlap": 0.16667\n }\n]\n\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"chromosome number")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"position interval start")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"position internal end")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"structural variant type")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantId"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"gnomAD ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all other populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the African super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the Ad Mixed American super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the African super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the Ad Mixed American super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the African / African American population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Latino population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the East Asian population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the European super population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all other populations.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"boolean"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Note:")," Following fields are not available in ",(0,r.kt)("em",{parentName:"p"},"GRCh38")," because the source file does not contain this information:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAf")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleAn")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleAn")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"othHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"maleHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"femaleHc")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/53b9567b.f283f1b9.js b/assets/js/53b9567b.f283f1b9.js new file mode 100644 index 00000000..c97cc377 --- /dev/null +++ b/assets/js/53b9567b.f283f1b9.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2522,7491],{3905:(e,t,n)=>{n.d(t,{Zo:()=>d,kt:()=>g});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},d=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},p=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,s=e.parentName,d=l(e,["components","mdxType","originalType","parentName"]),u=c(n),p=r,g=u["".concat(s,".").concat(p)]||u[p]||m[p]||o;return n?a.createElement(g,i(i({ref:t},d),{},{components:n})):a.createElement(g,i({ref:t},d))}));function g(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=p;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[u]="string"==typeof e?e:r,i[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>u,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/gme-json",id:"version-3.24/data-sources/gme-json",title:"gme-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gme-json.md",sourceDirName:"data-sources",slug:"/data-sources/gme-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gme-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],c={toc:s},d="wrapper";function u(e){let{components:t,...n}=e;return(0,r.kt)(d,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gmeVariome":{\n "allAc":10,\n "allAn":202,\n "allAf":0.049504,\n "failedFilter":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele number")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"GME allele frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}u.isMDXComponent=!0},620:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>m,frontMatter:()=>i,metadata:()=>s,toc:()=>c});var a=n(7462),r=(n(7294),n(3905)),o=n(8615);const i={title:"GME Variome"},l=void 0,s={unversionedId:"data-sources/gme",id:"version-3.24/data-sources/gme",title:"GME Variome",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/gme.mdx",sourceDirName:"data-sources",slug:"/data-sources/gme",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gme.mdx",tags:[],version:"3.24",frontMatter:{title:"GME Variome"},sidebar:"docs",previous:{title:"GERP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp"},next:{title:"gnomAD",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad"}},c=[{value:"Overview",id:"overview",children:[{value:"TSV Extraction",id:"tsv-extraction",children:[{value:"Parsing",id:"parsing",children:[],level:4}],level:3}],level:2},{value:"GRCh37 liftover",id:"grch37-liftover",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON output",id:"json-output",children:[],level:2}],d={toc:c},u="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"The ",(0,r.kt)("a",{parentName:"p",href:"http://igm.ucsd.edu/gme/index.php"},"Greater Middle East (GME) Variome")," Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. ",(0,r.kt)("em",{parentName:"p"},"Nature genetics"),", 48(9), 1071\u20131076. ",(0,r.kt)("a",{parentName:"p",href:"https://doi.org/10.1038/ng.3592"},"https://doi.org/10.1038/ng.3592")))),(0,r.kt)("h3",{id:"tsv-extraction"},"TSV Extraction"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chrom pos ref alt AA filter FunctionGVS geneFunction Gene GeneID SIFT_pred GERP++ AF GME_GC GME_AC GME_AF NWA NEA AP Israel SD TP CA FunctionGVS_new Priority Polyphen2_HVAR_pred LRT_pred MutationTaster_pred rsid OMIM_MIM OMIM_Disease AA_AC EA_AC rsid_link position_link\n1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133\n1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269\n1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427\n")),(0,r.kt)("h4",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"We parse the GME tsv file and extract the following columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"chrom"),(0,r.kt)("li",{parentName:"ul"},"pos"),(0,r.kt)("li",{parentName:"ul"},"ref"),(0,r.kt)("li",{parentName:"ul"},"alt"),(0,r.kt)("li",{parentName:"ul"},"filter"),(0,r.kt)("li",{parentName:"ul"},"GME_AC"),(0,r.kt)("li",{parentName:"ul"},"GME_AF")),(0,r.kt)("h2",{id:"grch37-liftover"},"GRCh37 liftover"),(0,r.kt)("p",null,"The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap."),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"http://igm.ucsd.edu/gme/download.shtml"},"http://igm.ucsd.edu/gme/download.shtml")),(0,r.kt)("h2",{id:"json-output"},"JSON output"),(0,r.kt)(o.default,{mdxType:"JSON"}))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/5c18c143.7f61c089.js b/assets/js/5c18c143.7f61c089.js new file mode 100644 index 00000000..24236b4c --- /dev/null +++ b/assets/js/5c18c143.7f61c089.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2703],{2194:e=>{e.exports=JSON.parse('{"pluginId":"default","version":"3.24","label":"3.24","banner":"unmaintained","badge":true,"className":"docs-version-3.24","isLast":false,"docsSidebars":{"docs":[{"type":"category","label":"Introduction","items":[{"type":"link","label":"Introduction","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/","docId":"introduction/introduction"},{"type":"link","label":"Licensed Content","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/licensedContent","docId":"introduction/licensedContent"},{"type":"link","label":"Dependencies","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/dependencies","docId":"introduction/dependencies"},{"type":"link","label":"Getting Started","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/getting-started","docId":"introduction/getting-started"},{"type":"link","label":"Parsing Illumina Connected Annotations JSON","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/parsing-json","docId":"introduction/parsing-json"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Data Sources","items":[{"type":"link","label":"1000 Genomes","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes","docId":"data-sources/1000Genomes"},{"type":"link","label":"Amino Acid Conservation","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation","docId":"data-sources/amino-acid-conservation"},{"type":"link","label":"Cancer Hotspots","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cancer-hotspots","docId":"data-sources/cancer-hotspots"},{"type":"link","label":"ClinGen","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen","docId":"data-sources/clingen"},{"type":"link","label":"ClinVar","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar","docId":"data-sources/clinvar"},{"type":"link","label":"ClinVar Preview","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview","docId":"data-sources/clinvar-preview"},{"type":"link","label":"COSMIC","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic","docId":"data-sources/cosmic"},{"type":"link","label":"DANN","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann","docId":"data-sources/dann"},{"type":"link","label":"dbSNP","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp","docId":"data-sources/dbsnp"},{"type":"link","label":"DECIPHER","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher","docId":"data-sources/decipher"},{"type":"link","label":"FusionCatcher","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher","docId":"data-sources/fusioncatcher"},{"type":"link","label":"GERP","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp","docId":"data-sources/gerp"},{"type":"link","label":"GME Variome","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme","docId":"data-sources/gme"},{"type":"link","label":"gnomAD","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad","docId":"data-sources/gnomad"},{"type":"link","label":"Mitochondrial Heteroplasmy","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mito-heteroplasmy","docId":"data-sources/mito-heteroplasmy"},{"type":"link","label":"MITOMAP","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap","docId":"data-sources/mitomap"},{"type":"link","label":"OMIM","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim","docId":"data-sources/omim"},{"type":"link","label":"PhyloP","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop","docId":"data-sources/phylop"},{"type":"link","label":"Primate AI-3D","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai","docId":"data-sources/primate-ai"},{"type":"link","label":"REVEL","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel","docId":"data-sources/revel"},{"type":"link","label":"Splice AI","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai","docId":"data-sources/splice-ai"},{"type":"link","label":"TOPMed","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed","docId":"data-sources/topmed"}],"collapsible":true,"collapsed":true},{"type":"category","label":"File Formats","items":[{"type":"link","label":"Illumina Connected Annotations JSON File Format","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-json-file-format","docId":"file-formats/illumina-annotator-json-file-format"},{"type":"link","label":"Illumina Connected Annotations VCF File Format","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-vcf-file-format","docId":"file-formats/illumina-annotator-vcf-file-format"},{"type":"link","label":"Custom Annotations","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/custom-annotations","docId":"file-formats/custom-annotations"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Core Functionality","items":[{"type":"link","label":"Canonical Transcripts","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/canonical-transcripts","docId":"core-functionality/canonical-transcripts"},{"type":"link","label":"Gene Fusion Detection","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/gene-fusions","docId":"core-functionality/gene-fusions"},{"type":"link","label":"ISCN Notation","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/iscn-notation","docId":"core-functionality/iscn-notation"},{"type":"link","label":"Junction Preserving Annotation","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/junction-preserving","docId":"core-functionality/junction-preserving"},{"type":"link","label":"Transcript Consequence Impact","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impacts","docId":"core-functionality/transcript-consequence-impacts"},{"type":"link","label":"Variant IDs","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/variant-ids","docId":"core-functionality/variant-ids"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Utilities","items":[{"type":"link","label":"Jasix","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/jasix","docId":"utilities/jasix"},{"type":"link","label":"SAUtils","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/sautils","docId":"utilities/sautils"}],"collapsible":true,"collapsed":true},{"type":"category","label":"FAQs","items":[{"type":"link","label":"Annotation Engine vs Data update","href":"/IlluminaConnectedAnnotationsDocumentation/3.24/frequently-asked-questions/Annotator-vs-data-update","docId":"frequently-asked-questions/Annotator-vs-data-update"}],"collapsible":true,"collapsed":true}]},"docs":{"core-functionality/canonical-transcripts":{"id":"core-functionality/canonical-transcripts","title":"Canonical Transcripts","description":"Overview","sidebar":"docs"},"core-functionality/gene-fusions":{"id":"core-functionality/gene-fusions","title":"Gene Fusion Detection","description":"Overview","sidebar":"docs"},"core-functionality/iscn-notation":{"id":"core-functionality/iscn-notation","title":"ISCN Notation","description":"Introduction","sidebar":"docs"},"core-functionality/junction-preserving":{"id":"core-functionality/junction-preserving","title":"Junction Preserving Annotation","description":"Background","sidebar":"docs"},"core-functionality/transcript-consequence-impacts":{"id":"core-functionality/transcript-consequence-impacts","title":"Transcript Consequence Impact","description":"Overview","sidebar":"docs"},"core-functionality/variant-ids":{"id":"core-functionality/variant-ids","title":"Variant IDs","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes":{"id":"data-sources/1000Genomes","title":"1000 Genomes","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes-snv-json":{"id":"data-sources/1000Genomes-snv-json","title":"1000Genomes-snv-json","description":"| Field | Type | Notes |"},"data-sources/1000Genomes-sv-json":{"id":"data-sources/1000Genomes-sv-json","title":"1000Genomes-sv-json","description":"| Field | Type | Notes |"},"data-sources/amino-acid-conservation":{"id":"data-sources/amino-acid-conservation","title":"Amino Acid Conservation","description":"Overview","sidebar":"docs"},"data-sources/amino-acid-conservation-json":{"id":"data-sources/amino-acid-conservation-json","title":"amino-acid-conservation-json","description":"| Field | Type | Notes |"},"data-sources/cancer-hotspots":{"id":"data-sources/cancer-hotspots","title":"Cancer Hotspots","description":"Overview","sidebar":"docs"},"data-sources/clingen":{"id":"data-sources/clingen","title":"ClinGen","description":"Overview","sidebar":"docs"},"data-sources/clingen-dosage-json":{"id":"data-sources/clingen-dosage-json","title":"clingen-dosage-json","description":"| Field | Type | Notes |"},"data-sources/clingen-gene-validity-json":{"id":"data-sources/clingen-gene-validity-json","title":"clingen-gene-validity-json","description":"| Field | Type | Notes |"},"data-sources/clingen-json":{"id":"data-sources/clingen-json","title":"clingen-json","description":"| Field | Type | Notes |"},"data-sources/clinvar":{"id":"data-sources/clinvar","title":"ClinVar","description":"Overview","sidebar":"docs"},"data-sources/clinvar-json":{"id":"data-sources/clinvar-json","title":"clinvar-json","description":"small variants:"},"data-sources/clinvar-preview":{"id":"data-sources/clinvar-preview","title":"ClinVar Preview","description":"Overview","sidebar":"docs"},"data-sources/clinvar-preview-json":{"id":"data-sources/clinvar-preview-json","title":"clinvar-preview-json","description":"small variants:"},"data-sources/cosmic":{"id":"data-sources/cosmic","title":"COSMIC","description":"Overview","sidebar":"docs"},"data-sources/cosmic-cancer-gene-census":{"id":"data-sources/cosmic-cancer-gene-census","title":"cosmic-cancer-gene-census","description":"| Field | Type | Notes |"},"data-sources/cosmic-gene-fusion-json":{"id":"data-sources/cosmic-gene-fusion-json","title":"cosmic-gene-fusion-json","description":"| Field | Type | Notes |"},"data-sources/cosmic-json":{"id":"data-sources/cosmic-json","title":"cosmic-json","description":"| Field | Type | Notes |"},"data-sources/dann":{"id":"data-sources/dann","title":"DANN","description":"Overview","sidebar":"docs"},"data-sources/dann-json":{"id":"data-sources/dann-json","title":"dann-json","description":"| Field | Type | Notes |"},"data-sources/dbsnp":{"id":"data-sources/dbsnp","title":"dbSNP","description":"Overview","sidebar":"docs"},"data-sources/dbsnp-json":{"id":"data-sources/dbsnp-json","title":"dbsnp-json","description":"| Field | Type | Notes |"},"data-sources/decipher":{"id":"data-sources/decipher","title":"DECIPHER","description":"Overview","sidebar":"docs"},"data-sources/decipher-json":{"id":"data-sources/decipher-json","title":"decipher-json","description":"| Field | Type | Notes |"},"data-sources/fusioncatcher":{"id":"data-sources/fusioncatcher","title":"FusionCatcher","description":"Overview","sidebar":"docs"},"data-sources/fusioncatcher-json":{"id":"data-sources/fusioncatcher-json","title":"fusioncatcher-json","description":"| Field | Type | Notes |"},"data-sources/gerp":{"id":"data-sources/gerp","title":"GERP","description":"Overview","sidebar":"docs"},"data-sources/gerp-json":{"id":"data-sources/gerp-json","title":"gerp-json","description":"| Field | Type | Notes |"},"data-sources/gme":{"id":"data-sources/gme","title":"GME Variome","description":"Overview","sidebar":"docs"},"data-sources/gme-json":{"id":"data-sources/gme-json","title":"gme-json","description":"| Field | Type | Notes |"},"data-sources/gnomad":{"id":"data-sources/gnomad","title":"gnomAD","description":"Overview","sidebar":"docs"},"data-sources/gnomad-lof-json":{"id":"data-sources/gnomad-lof-json","title":"gnomad-lof-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-small-variants-json":{"id":"data-sources/gnomad-small-variants-json","title":"gnomad-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-structural-variants-data_description":{"id":"data-sources/gnomad-structural-variants-data_description","title":"gnomad-structural-variants-data_description","description":"Bed Example"},"data-sources/gnomad-structural-variants-json":{"id":"data-sources/gnomad-structural-variants-json","title":"gnomad-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad4.0-lof-json":{"id":"data-sources/gnomad4.0-lof-json","title":"gnomad4.0-lof-json","description":""},"data-sources/gnomad4.0-small-variants-json":{"id":"data-sources/gnomad4.0-small-variants-json","title":"gnomad4.0-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad40-structural-variants-json":{"id":"data-sources/gnomad40-structural-variants-json","title":"gnomad40-structural-variants-json","description":""},"data-sources/mito-heteroplasmy":{"id":"data-sources/mito-heteroplasmy","title":"Mitochondrial Heteroplasmy","description":"Overview","sidebar":"docs"},"data-sources/mitomap":{"id":"data-sources/mitomap","title":"MITOMAP","description":"Overview","sidebar":"docs"},"data-sources/mitomap-small-variants-json":{"id":"data-sources/mitomap-small-variants-json","title":"mitomap-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/mitomap-structural-variants-json":{"id":"data-sources/mitomap-structural-variants-json","title":"mitomap-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/omim":{"id":"data-sources/omim","title":"OMIM","description":"Overview","sidebar":"docs"},"data-sources/omim-json":{"id":"data-sources/omim-json","title":"omim-json","description":"| Field | Type | Notes |"},"data-sources/phylop":{"id":"data-sources/phylop","title":"PhyloP","description":"Overview","sidebar":"docs"},"data-sources/phylop-json":{"id":"data-sources/phylop-json","title":"phylop-json","description":"| Field | Type | Notes |"},"data-sources/phylopprimate-json":{"id":"data-sources/phylopprimate-json","title":"phylopprimate-json","description":"| Field | Type | Notes |"},"data-sources/primate-ai":{"id":"data-sources/primate-ai","title":"Primate AI-3D","description":"Overview","sidebar":"docs"},"data-sources/primate-ai-json":{"id":"data-sources/primate-ai-json","title":"primate-ai-json","description":"| Field | Type | Notes |"},"data-sources/revel":{"id":"data-sources/revel","title":"REVEL","description":"Overview","sidebar":"docs"},"data-sources/revel-json":{"id":"data-sources/revel-json","title":"revel-json","description":"| Field | Type | Notes |"},"data-sources/splice-ai":{"id":"data-sources/splice-ai","title":"Splice AI","description":"Overview","sidebar":"docs"},"data-sources/splice-ai-json":{"id":"data-sources/splice-ai-json","title":"splice-ai-json","description":"| Field | Type | Notes |"},"data-sources/topmed":{"id":"data-sources/topmed","title":"TOPMed","description":"Overview","sidebar":"docs"},"data-sources/topmed-json":{"id":"data-sources/topmed-json","title":"topmed-json","description":"| Field | Type | Notes |"},"file-formats/custom-annotations":{"id":"file-formats/custom-annotations","title":"Custom Annotations","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-json-file-format":{"id":"file-formats/illumina-annotator-json-file-format","title":"Illumina Connected Annotations JSON File Format","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-vcf-file-format":{"id":"file-formats/illumina-annotator-vcf-file-format","title":"Illumina Connected Annotations VCF File Format","description":"Overview","sidebar":"docs"},"frequently-asked-questions/Annotator-vs-data-update":{"id":"frequently-asked-questions/Annotator-vs-data-update","title":"Annotation Engine vs Data update","description":"Background","sidebar":"docs"},"introduction/dependencies":{"id":"introduction/dependencies","title":"Dependencies","description":"All of the following dependencies have been included in this repository.","sidebar":"docs"},"introduction/getting-started":{"id":"introduction/getting-started","title":"Getting Started","description":"Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.","sidebar":"docs"},"introduction/introduction":{"id":"introduction/introduction","title":"Introduction","description":"Clinical-grade variant annotation","sidebar":"docs"},"introduction/licensedContent":{"id":"introduction/licensedContent","title":"Licensed Content","description":"Illumina Conncted Annotations supports following content which is available through a license from Illumina.","sidebar":"docs"},"introduction/parsing-json":{"id":"introduction/parsing-json","title":"Parsing Illumina Connected Annotations JSON","description":"Parsing JSON","sidebar":"docs"},"utilities/jasix":{"id":"utilities/jasix","title":"Jasix","description":"Overview","sidebar":"docs"},"utilities/sautils":{"id":"utilities/sautils","title":"SAUtils","description":"Overview","sidebar":"docs"}}}')}}]); \ No newline at end of file diff --git a/assets/js/64bd7e9e.3c740e86.js b/assets/js/64bd7e9e.3c740e86.js new file mode 100644 index 00000000..7f8caa40 --- /dev/null +++ b/assets/js/64bd7e9e.3c740e86.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9525],{3905:(t,a,r)=>{r.d(a,{Zo:()=>p,kt:()=>l});var n=r(7294);function A(t,a,r){return a in t?Object.defineProperty(t,a,{value:r,enumerable:!0,configurable:!0,writable:!0}):t[a]=r,t}function i(t,a){var r=Object.keys(t);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(t);a&&(n=n.filter((function(a){return Object.getOwnPropertyDescriptor(t,a).enumerable}))),r.push.apply(r,n)}return r}function e(t){for(var a=1;a=0||(A[r]=t[r]);return A}(t,a);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(t);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(t,r)&&(A[r]=t[r])}return A}var E=n.createContext({}),c=function(t){var a=n.useContext(E),r=a;return t&&(r="function"==typeof t?t(a):e(e({},a),t)),r},p=function(t){var a=c(t.components);return n.createElement(E.Provider,{value:a},t.children)},G="mdxType",T={inlineCode:"code",wrapper:function(t){var a=t.children;return n.createElement(n.Fragment,{},a)}},o=n.forwardRef((function(t,a){var r=t.components,A=t.mdxType,i=t.originalType,E=t.parentName,p=s(t,["components","mdxType","originalType","parentName"]),G=c(r),o=A,l=G["".concat(E,".").concat(o)]||G[o]||T[o]||i;return r?n.createElement(l,e(e({ref:a},p),{},{components:r})):n.createElement(l,e({ref:a},p))}));function l(t,a){var r=arguments,A=a&&a.mdxType;if("string"==typeof t||A){var i=r.length,e=new Array(i);e[0]=o;var s={};for(var E in a)hasOwnProperty.call(a,E)&&(s[E]=a[E]);s.originalType=t,s[G]="string"==typeof t?t:A,e[1]=s;for(var c=2;c{r.r(a),r.d(a,{contentTitle:()=>e,default:()=>G,frontMatter:()=>i,metadata:()=>s,toc:()=>E});var n=r(7462),A=(r(7294),r(3905));const i={title:"Illumina Connected Annotations VCF File Format"},e=void 0,s={unversionedId:"file-formats/illumina-annotator-vcf-file-format",id:"version-3.24/file-formats/illumina-annotator-vcf-file-format",title:"Illumina Connected Annotations VCF File Format",description:"Overview",source:"@site/versioned_docs/version-3.24/file-formats/illumina-annotator-vcf-file-format.mdx",sourceDirName:"file-formats",slug:"/file-formats/illumina-annotator-vcf-file-format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-vcf-file-format",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/file-formats/illumina-annotator-vcf-file-format.mdx",tags:[],version:"3.24",frontMatter:{title:"Illumina Connected Annotations VCF File Format"},sidebar:"docs",previous:{title:"Illumina Connected Annotations JSON File Format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-json-file-format"},next:{title:"Custom Annotations",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/custom-annotations"}},E=[{value:"Overview",id:"overview",children:[{value:"VCF Output Format",id:"vcf-output-format",children:[{value:"Header",id:"header",children:[],level:4},{value:"VCF Lines",id:"vcf-lines",children:[],level:4}],level:3}],level:2}],c={toc:E},p="wrapper";function G(t){let{components:a,...r}=t;return(0,A.kt)(p,(0,n.Z)({},c,r,{components:a,mdxType:"MDXLayout"}),(0,A.kt)("h2",{id:"overview"},"Overview"),(0,A.kt)("p",null,"While JSON output format is the default option, we support VCF file as our output too. The VCF output mode can be enabled by ",(0,A.kt)("inlineCode",{parentName:"p"},"--output-mode vcf")," as shown below:"),(0,A.kt)("pre",null,(0,A.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet Annotator.dll \\\n -c Data/Cache \\\n --output-format vcf \\\n -r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \\\n -i HiSeq.10000.vcf.gz \\\n -o HiSeq.10000.out\n# HiSeq.10000.out.vcf.gz file should be produced after processing.\n")),(0,A.kt)("h3",{id:"vcf-output-format"},"VCF Output Format"),(0,A.kt)("h4",{id:"header"},"Header"),(0,A.kt)("p",null,"The output VCF file should have headers similar as below, which indicates the IlluminaConnectedAnnotations's version, file creation time, assembly, and data sources used for producing the output:"),(0,A.kt)("pre",null,(0,A.kt)("code",{parentName:"pre",className:"language-tsv"},'##fileformat=VCFv4.2\n##IlluminaConnectedAnnotations="3.24.0" time="2024-03-22 07:02:13" assembly="GRCh38" Ensembl="110" RefSeq="GCF_000001405.40-RS_2023_03"\n##FILTER=\n##fileDate=20230110\n##INFO=\n...\n##INFO=\n...\n#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Novaseq_TSPF450-NA12878-1-HFHWJDMXX_S1_L001 Novaseq_TSPF450-NA12891-1-HFHWJDMXX_S3_L001\n')),(0,A.kt)("h4",{id:"vcf-lines"},"VCF Lines"),(0,A.kt)("p",null,"Core annotation for overlapping transcripts is enabled and no supplementary annotation is added in VCF mode. A CSQ field is added under INFO column with following format:"),(0,A.kt)("pre",null,(0,A.kt)("code",{parentName:"pre"},'##INFO=\n')),(0,A.kt)("p",null,"Multiple transcripts are separated with ",(0,A.kt)("inlineCode",{parentName:"p"},","),". An example of produced VCF lines as below:"),(0,A.kt)("pre",null,(0,A.kt)("code",{parentName:"pre"},"chr21 5316038 MantaDEL:1:11095:74644:0:4:0 G 999 MaxDepth END=7246574;SVTYPE=DEL;SVLEN=-1930536;SVINSLEN=4;SVINSSEQ=TTCT;CSQ=|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624261.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624859.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000623227.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000619252.4|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623449.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623436.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624627.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624368.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623914.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624516.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624412.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000622939.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623050.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624444.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623887.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000611026.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000610788.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279784|Transcript|ENST00000623587.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279064|Transcript|ENST00000623723.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000288187|Transcript|ENST00000671789.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000616522.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000621924.4|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000619488.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000617746.4|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000624446.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623405.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623575.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623506.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280019|Transcript|ENST00000624484.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279709|Transcript|ENST00000623377.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688828.2|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688458.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692898.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689306.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692318.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624576.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623738.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701070.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623989.4|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701260.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692046.2|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692237.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689354.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624165.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624847.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000615262.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623047.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623950.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000623225.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623324.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000624181.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279788|Transcript|ENST00000624266.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279728|Transcript|ENST00000623809.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280164|Transcript|ENST00000623892.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279998|Transcript|ENST00000623678.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|hsa-mir-8069-1|Transcript|ENST00000616627.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279751|Transcript|ENST00000623720.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623165.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624519.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623347.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624728.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000279477|Transcript|ENST00000623518.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000278884|Transcript|ENST00000625184.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623095.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000622911.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000621909.4|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623394.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000624310.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000615804.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000617336.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|CTBP2P10|Transcript|ENST00000624153.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LINC03104|Transcript|NR_170984.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724354|Transcript|NR_136540.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|CH507-42P11.6|Transcript|NR_171776.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724428|Transcript|NM_001320643.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354012.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354009.3|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NR_148682.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354010.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354015.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354014.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001321073.3|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354008.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354007.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354006.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320646.2|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320650.2|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320648.2|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320651.2|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001314050.5|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC106780825|Transcript|NR_133678.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001320719.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146656.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146655.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146657.1|False||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|MIR8069-1|Transcript|NR_107036.1|True||||21-5316038-7246574-G--DEL,|transcript_ablation&transcript_variant|LOC102724843|Transcript|NR_170986.1|True||||21-5316038-7246574-G--DEL GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:999:999,0,999:58,5:69,63:.:. 0/1:PASS:999:999,0,999:59,7:67,71:.:. 0/1:PASS:999:999,0,999:118,4:140,79:.:.\nchr21 6639699 MantaDEL:514264:0:0:0:7:0 AGAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG AAA 537 MaxMQ0Frac END=6639804;SVTYPE=DEL;SVLEN=-105;CIGAR=1M2I105D;CSQ=AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623047.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623106.3:n.223-5036_223-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True|NC_000021.9:g.6639700_6639804delinsAA|ENST00000625185.3:n.232-5036_232-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624846.3:n.130-5036_130-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623313.1:n.312-7367_312-7263delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623950.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624965.1:n.151-5036_151-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:8:205,0,4:1,0:11,5:.:. 0/1:PASS:86:431,0,83:0,0:16,13:.:. 0/0:HomRef:61:0,11,66:2,0:7,0:.:.\nchr21 8811598 MantaBND:514412:0:1:0:0:0:0 G G[chr21:8854301[ 999 NoPairSupport SVTYPE=BND;MATEID=MantaBND:514412:0:1:0:0:0:1;CIPOS=0,4;HOMLEN=4;HOMSEQ=TGCA;BND_DEPTH=300;MATE_BND_DEPTH=213;CSQ=G[chr21:8854301[|transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True||||21-8811598-G-G[chr21:8854301[ GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:253:303,0,999:9,0:89,12:.:. 0/1:PASS:999:999,0,999:0,0:99,39:.:. 0/0:HomRef:410:0,360,999:17,0:141,0:.:.\nchr21 8813774 MantaINS:514450:0:0:0:1:0 T TATATATACATATATATATATACATATATATATATGTATATATATATATATAC 487 MaxMQ0Frac END=8813774;SVTYPE=INS;SVLEN=52;CIGAR=1M52I;CIPOS=0,7;HOMLEN=7;HOMSEQ=ATATATA;CSQ=ATATATACATATATATATATACATATATATATATGTATATATATATATATAC|intron_variant&non_coding_transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True|NC_000021.9:g.8813781_8813782insCATATATATATATACATATATATATATGTATATATATATATATACATATATA|ENST00000651312.1:n.40-6603_40-6602insGTATATATATATATATACATATATATATATGTATATATATATATGTATATAT||21-8813774-T-TATATATACATATATATATATACATATATATATATGTATATATATATATATAC GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:29:128,0,26:0,0:8,4:.:. 1/1:PASS:6:335,8,0:0,0:6,8:.:. 0/1:PASS:21:176,0,18:0,0:3,6:.:.\n")))}G.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/65232248.8ac53226.js b/assets/js/65232248.8ac53226.js new file mode 100644 index 00000000..885a60d2 --- /dev/null +++ b/assets/js/65232248.8ac53226.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5352,6599,4388,6274,8308,1090,5955,8053],{3905:(t,e,n)=>{n.d(e,{Zo:()=>s,kt:()=>c});var a=n(7294);function l(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function r(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function i(t){for(var e=1;e=0||(l[n]=t[n]);return l}(t,e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(l[n]=t[n])}return l}var m=a.createContext({}),p=function(t){var e=a.useContext(m),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},s=function(t){var e=p(t.components);return a.createElement(m.Provider,{value:e},t.children)},u="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},N=a.forwardRef((function(t,e){var n=t.components,l=t.mdxType,r=t.originalType,m=t.parentName,s=o(t,["components","mdxType","originalType","parentName"]),u=p(n),N=l,c=u["".concat(m,".").concat(N)]||u[N]||d[N]||r;return n?a.createElement(c,i(i({ref:e},s),{},{components:n})):a.createElement(c,i({ref:e},s))}));function c(t,e){var n=arguments,l=e&&e.mdxType;if("string"==typeof t||l){var r=n.length,i=new Array(r);i[0]=N;var o={};for(var m in e)hasOwnProperty.call(e,m)&&(o[m]=e[m]);o.originalType=t,o[u]="string"==typeof t?t:l,i[1]=o;for(var p=2;p{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad-lof-json",id:"version-3.24/data-sources/gnomad-lof-json",title:"gnomad-lof-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-lof-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-lof-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-lof-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD":{ \n "pLi":1.00e0,\n "pNull":8.94e-40,\n "pRec":1.84e-16,\n "synZ":-8.44e-2,\n "misZ":5.96e-1,\n "loeuf":1.13e0\n}\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pLi"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pNull"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being completely tolerant of loss of function variation (observed = expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pRec"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"synZ"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected synonymous Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"misZ"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected missense Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"loeuf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"loss of function observed/expected upper bound fraction (LOEUF)")))))}u.isMDXComponent=!0},7811:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad-small-variants-json",id:"version-3.24/data-sources/gnomad-small-variants-json",title:"gnomad-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad":{ \n "coverage":20,\n "allAf":0.190317,\n "maleAf":0.193,\n "femaleAf": 0.1935,\n "afrAf":0.222876,\n "amrAf":0.121394,\n "easAf":0.239802,\n "finAf":0.136833,\n "nfeAf":0.181282,\n "asjAf":0.258278,\n "othAf":0.186094,\n "allAn":30796,\n "maleAn":15096,\n "femaleAn":15700\n "afrAn":8664,\n "amrAn":832,\n "easAn":1618,\n "finAn":3486,\n "nfeAn":14916,\n "asjAn":302,\n "othAn":978,\n "allAc":5861,\n "maleAc":2930,\n "femaleAc": 2931,\n "afrAc":1931,\n "amrAc":101,\n "easAc":388,\n "finAc":477,\n "nfeAc":2704,\n "asjAc":78,\n "othAc":182,\n "allHc":561,\n "afrHc":208,\n "amrHc":6,\n "easHc":42,\n "finHc":31,\n "nfeHc":242,\n "asjHc":13,\n "othHc":19,\n "maleHc":280,\n "femaleHc":281,\n "controlsAllAf":0.190317,\n "controlsAllAn":30796,\n "controlsAllAc":5861,\n "lowComplexityRegion":true,\n "failedFilter":true\n}\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"coverage"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"average coverage (non-negative integer values)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the controls subset. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for male population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for female population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the controls subset. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for male population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for female population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"controlsAllAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the controls subset. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the African / African American population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the African / African American population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the African / African American population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for African / African American population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Latino population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Latino population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Latino population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Latino population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for East Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Finnish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Finnish population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Finnish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Finnish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Non-Finnish European population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Non-Finnish European population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Non-Finnish European population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Non-Finnish European population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Other population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Other population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Other population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Other population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Ashkenazi Jewish population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Ashkenazi Jewish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the South Asian population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the South Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the South Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"lowComplexityRegion"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant is located in a low complexity region.")))))}u.isMDXComponent=!0},7962:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad-structural-variants-data_description",id:"version-3.24/data-sources/gnomad-structural-variants-data_description",title:"gnomad-structural-variants-data_description",description:"Bed Example",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-data_description.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-structural-variants-data_description",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-data_description",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-data_description.md",tags:[],version:"3.24",frontMatter:{}},m=[{value:"Bed Example",id:"bed-example",children:[],level:4},{value:"Structural Variant Type Mapping",id:"structural-variant-type-mapping",children:[],level:4}],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("h4",{id:"bed-example"},"Bed Example"),(0,l.kt)("p",null,"The bed file was obtained from original source for GRCh37"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"#chrom start end name svtype ALGORITHMS BOTHSIDES_SUPPORT CHR2 CPX_INTERVALS CPX_TYPE END2 ENDEVIDENCE HIGH_SR_BACKGROUND PCRPLUS_DEPLETED PESR_GT_OVERDISPERSION POS2 PROTEIN_CODING__COPY_GAIN PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC PROTEIN_CODING__INTRONIC PROTEIN_CODING__INV_SPAN PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER PROTEIN_CODING__UTR SOURCE STRANDS SVLEN SVTYPE UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN AC AF N_BI_GENOS N_HOMREF N_HET N_HOMALT FREQ_HOMREF FREQ_HET FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF MALE_N_HET MALE_N_HOMALT MALE_FREQ_HOMREF MALE_FREQ_HET MALE_FREQ_HOMALT MALE_N_HEMIREF MALE_N_HEMIALT MALE_FREQ_HEMIREF MALE_FREQ_HEMIALT PAR FEMALE_AN FEMALE_AC FEMALE_AF FEMALE_N_BI_GENOS FEMALE_N_HOMREF FEMALE_N_HET FEMALE_N_HOMALT FEMALE_FREQ_HOMREF FEMALE_FREQ_HET FEMALE_FREQ_HOMALT POPMAX_AF AFR_AN AFR_AC AFR_AF AFR_N_BI_GENOS AFR_N_HOMREF AFR_N_HET AFR_N_HOMALT AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF AFR_MALE_N_HET AFR_MALE_N_HOMALT AFR_MALE_FREQ_HOMREF AFR_MALE_FREQ_HET AFR_MALE_FREQ_HOMALT AFR_MALE_N_HEMIREF AFR_MALE_N_HEMIALT AFR_MALE_FREQ_HEMIREF AFR_MALE_FREQ_HEMIALT AFR_FEMALE_AN AFR_FEMALE_AC AFR_FEMALE_AF AFR_FEMALE_N_BI_GENOS AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT AMR_AN AMR_AC AMR_AF AMR_N_BI_GENOS AMR_N_HOMREF AMR_N_HET AMR_N_HOMALT AMR_FREQ_HOMREF AMR_FREQ_HET AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF AMR_MALE_N_HET AMR_MALE_N_HOMALT AMR_MALE_FREQ_HOMREF AMR_MALE_FREQ_HET AMR_MALE_FREQ_HOMALT AMR_MALE_N_HEMIREF AMR_MALE_N_HEMIALT AMR_MALE_FREQ_HEMIREF AMR_MALE_FREQ_HEMIALT AMR_FEMALE_AN AMR_FEMALE_AC AMR_FEMALE_AF AMR_FEMALE_N_BI_GENOS AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT EAS_AN EAS_AC EAS_AF EAS_N_BI_GENOS EAS_N_HOMREF EAS_N_HET EAS_N_HOMALT EAS_FREQ_HOMREF EAS_FREQ_HET EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF EAS_MALE_N_HET EAS_MALE_N_HOMALT EAS_MALE_FREQ_HOMREF EAS_MALE_FREQ_HET EAS_MALE_FREQ_HOMALT EAS_MALE_N_HEMIREF EAS_MALE_N_HEMIALT EAS_MALE_FREQ_HEMIREF EAS_MALE_FREQ_HEMIALT EAS_FEMALE_AN EAS_FEMALE_AC EAS_FEMALE_AF EAS_FEMALE_N_BI_GENOS EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT EUR_AN EUR_AC EUR_AF EUR_N_BI_GENOS EUR_N_HOMREF EUR_N_HET EUR_N_HOMALT EUR_FREQ_HOMREF EUR_FREQ_HET EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF EUR_MALE_N_HET EUR_MALE_N_HOMALT EUR_MALE_FREQ_HOMREF EUR_MALE_FREQ_HET EUR_MALE_FREQ_HOMALT EUR_MALE_N_HEMIREF EUR_MALE_N_HEMIALT EUR_MALE_FREQ_HEMIREF EUR_MALE_FREQ_HEMIALT EUR_FEMALE_AN EUR_FEMALE_AC EUR_FEMALE_AF EUR_FEMALE_N_BI_GENOS EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT OTH_AN OTH_AC OTH_AF OTH_N_BI_GENOS OTH_N_HOMREF OTH_N_HET OTH_N_HOMALT OTH_FREQ_HOMREF OTH_FREQ_HET OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF OTH_MALE_N_HET OTH_MALE_N_HOMALT OTH_MALE_FREQ_HOMREF OTH_MALE_FREQ_HET OTH_MALE_FREQ_HOMALT OTH_MALE_N_HEMIREF OTH_MALE_N_HEMIALT OTH_MALE_FREQ_HEMIREF OTH_MALE_FREQ_HEMIALT OTH_FEMALE_AN OTH_FEMALE_AC OTH_FEMALE_AF OTH_FEMALE_N_BI_GENOS OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT FILTER\n1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED \n")),(0,l.kt)("h4",{id:"structural-variant-type-mapping"},"Structural Variant Type Mapping"),(0,l.kt)("p",null,"The source files represented the structural variants with keys using various naming conventions.\nIn the Illumina Connected Annotations JSON output, these keys will be mapped according to the following. "),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations JSON SV Type Key"),(0,l.kt)("th",{parentName:"tr",align:null},"GRCh37 Source SV Type Key"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"copy_number_variation"),(0,l.kt)("td",{parentName:"tr",align:null})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"deletion"),(0,l.kt)("td",{parentName:"tr",align:null},"DEL, CN=0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"duplication"),(0,l.kt)("td",{parentName:"tr",align:null},"DUP")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"inversion"),(0,l.kt)("td",{parentName:"tr",align:null},"INV")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:ALU")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:LINE1")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:SVA")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"structural alteration"),(0,l.kt)("td",{parentName:"tr",align:null})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"complex_structural_alteration"),(0,l.kt)("td",{parentName:"tr",align:null},"CPX")))))}u.isMDXComponent=!0},1231:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad-structural-variants-json",id:"version-3.24/data-sources/gnomad-structural-variants-json",title:"gnomad-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD-preview": [\n {\n "chromosome": "1",\n "begin": 40001,\n "end": 47200,\n "variantId": "gnomAD-SV_v2.1_DUP_1_1",\n "variantType": "duplication",\n "failedFilter": true,\n "allAf": 0.068963,\n "afrAf": 0.135694,\n "amrAf": 0.022876,\n "easAf": 0.01101,\n "eurAf": 0.007846,\n "othAf": 0.017544,\n "femaleAf": 0.065288,\n "maleAf": 0.07255,\n "allAc": 943,\n "afrAc": 866,\n "amrAc": 21,\n "easAc": 17,\n "eurAc": 37,\n "othAc": 2,\n "femaleAc": 442,\n "maleAc": 499,\n "allAn": 13674,\n "afrAn": 6382,\n "amrAn": 918,\n "easAn": 1544,\n "eurAn": 4716,\n "othAn": 114,\n "femaleAn": 6770,\n "maleAn": 6878,\n "allHc": 91,\n "afrHc": 90,\n "amrHc": 1,\n "easHc": 0,\n "eurHc": 0,\n "othHc": 55,\n "femaleHc": 44,\n "maleHc": 47,\n "reciprocalOverlap": 0.01839,\n "annotationOverlap": 0.16667\n }\n]\n\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,l.kt)("td",{parentName:"tr",align:null},"string"),(0,l.kt)("td",{parentName:"tr",align:null},"chromosome number")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"begin"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"position interval start")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"end"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"position internal end")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"variantType"),(0,l.kt)("td",{parentName:"tr",align:null},"string"),(0,l.kt)("td",{parentName:"tr",align:null},"structural variant type")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"variantId"),(0,l.kt)("td",{parentName:"tr",align:null},"string"),(0,l.kt)("td",{parentName:"tr",align:null},"gnomAD ID")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all other populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the African super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Ad Mixed American super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the European super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all other populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for male population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for female population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the African super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Ad Mixed American super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"eurAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the European super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all other populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for female population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for male population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the African / African American population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Latino population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the East Asian population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"eurAc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the European super population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all other populations.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"integer"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,l.kt)("td",{parentName:"tr",align:null},"boolean"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,l.kt)("td",{parentName:"tr",align:null},"floating point"),(0,l.kt)("td",{parentName:"tr",align:null},"Reciprocal overlap. Range: 0 - 1.0")))),(0,l.kt)("p",null,(0,l.kt)("strong",{parentName:"p"},"Note:")," Following fields are not available in ",(0,l.kt)("em",{parentName:"p"},"GRCh38")," because the source file does not contain this information:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"eurAc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"othHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter")))))}u.isMDXComponent=!0},9028:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>N,default:()=>A,frontMatter:()=>d,metadata:()=>c,toc:()=>g});var a=n(7462),l=(n(7294),n(3905)),r=n(7811),i=n(9043),o=n(7510),m=n(3274),p=n(1231),s=n(4404),u=n(7962);const d={title:"gnomAD"},N=void 0,c={unversionedId:"data-sources/gnomad",id:"version-3.24/data-sources/gnomad",title:"gnomAD",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/gnomad.mdx",sourceDirName:"data-sources",slug:"/data-sources/gnomad",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad.mdx",tags:[],version:"3.24",frontMatter:{title:"gnomAD"},sidebar:"docs",previous:{title:"GME Variome",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme"},next:{title:"Mitochondrial Heteroplasmy",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mito-heteroplasmy"}},g=[{value:"Overview",id:"overview",children:[],level:2},{value:"gnomAD v4.0 (GRCh38)",id:"gnomad-v40-grch38",children:[{value:"Small Variants",id:"small-variants",children:[{value:"VCF extraction",id:"vcf-extraction",children:[],level:4},{value:"JSON output",id:"json-output",children:[],level:4},{value:"Calculation",id:"calculation",children:[],level:4}],level:3},{value:"LoF Gene Metrics",id:"lof-gene-metrics",children:[{value:"Tab delimited file example",id:"tab-delimited-file-example",children:[],level:4},{value:"JSON key to TSV column mapping",id:"json-key-to-tsv-column-mapping",children:[],level:4}],level:3},{value:"Structural Variants",id:"structural-variants",children:[{value:"Structural Variant Type Mapping",id:"structural-variant-type-mapping",children:[],level:4},{value:"JSON output",id:"json-output-1",children:[],level:4}],level:3}],level:2},{value:"gnomAD v2.1 (GRCh37)",id:"gnomad-v21-grch37",children:[{value:"Small Variants",id:"small-variants-1",children:[{value:"VCF extraction",id:"vcf-extraction-1",children:[],level:4},{value:"Computation",id:"computation",children:[],level:4},{value:"Merging genomes and exomes",id:"merging-genomes-and-exomes",children:[],level:4},{value:"Filters",id:"filters",children:[],level:4},{value:"VCF download instructions",id:"vcf-download-instructions",children:[],level:4},{value:"JSON output",id:"json-output-2",children:[{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[],level:5},{value:"Source data files",id:"source-data-files",children:[],level:5}],level:4}],level:3},{value:"LoF Gene Metrics",id:"lof-gene-metrics-1",children:[{value:"Tab delimited file example",id:"tab-delimited-file-example-1",children:[],level:4},{value:"JSON key to TSV column mapping",id:"json-key-to-tsv-column-mapping-1",children:[],level:4},{value:"Gene symbol update",id:"gene-symbol-update",children:[],level:4},{value:"Conflict resolution",id:"conflict-resolution",children:[],level:4},{value:"Download URL",id:"download-url",children:[],level:4},{value:"JSON output",id:"json-output-3",children:[],level:4}],level:3},{value:"Structural Variants",id:"structural-variants-1",children:[{value:"Source Files",id:"source-files",children:[],level:4},{value:"Download URLs",id:"download-urls",children:[{value:"GRCh37",id:"grch37",children:[],level:5}],level:4},{value:"JSON output",id:"json-output-4",children:[],level:4}],level:3}],level:2}],k={toc:g},f="wrapper";function A(t){let{components:e,...n}=t;return(0,l.kt)(f,(0,a.Z)({},k,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("h2",{id:"overview"},"Overview"),(0,l.kt)("p",null,"The Genome Aggregation Database (",(0,l.kt)("a",{parentName:"p",href:"https://gnomad.broadinstitute.org/"},"gnomAD"),") is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community."),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Koch, L., 2020. Exploring human genomic diversity with gnomAD. ",(0,l.kt)("em",{parentName:"p"},"Nature Reviews Genetics"),", ",(0,l.kt)("strong",{parentName:"p"},"21(8)"),", pp.448-448."))),(0,l.kt)("p",null,"Illumina Connected Analysis will support gnomAD v4.0 for GRCh38 assembly and gnomAD v2.1 for GRCh37."),(0,l.kt)("h2",{id:"gnomad-v40-grch38"},"gnomAD v4.0 (GRCh38)"),(0,l.kt)("h3",{id:"small-variants"},"Small Variants"),(0,l.kt)("p",null,"In gnomAD v4.0, like gnomAD v2.1, there are genome and exome data. Compare to gnomAD v2.1 which the data for genome and exome are merged, for gnomAD 4.0, Illumina Connected Annotation will separate them with different JSON output field.\nFor gnomAD genome, the field name would be ",(0,l.kt)("inlineCode",{parentName:"p"},"gnomad"),". For gnomAD exome, the field name would be ",(0,l.kt)("inlineCode",{parentName:"p"},"gnomad-exome"),".\nDespite this difference in the field name, the JSON data format would be identical for both genome and exome."),(0,l.kt)("h4",{id:"vcf-extraction"},"VCF extraction"),(0,l.kt)("p",null,"We currently extract the following info fields from both gnomAD genome and exome VCF files:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},'##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n')),(0,l.kt)("h4",{id:"json-output"},"JSON output"),(0,l.kt)(i.default,{mdxType:"JSONV40"}),(0,l.kt)("h4",{id:"calculation"},"Calculation"),(0,l.kt)("p",null,"To calculate allele frequency for each group, we divide the allele count with allele number for each group."),(0,l.kt)("h3",{id:"lof-gene-metrics"},"LoF Gene Metrics"),(0,l.kt)("p",null,"In gnomAD 4.0, the gene score data for LOF is given per transcript.\nSince this is gene level data, one of the transcripts need to be chosen and value reported.\nThe transcript ID of the selected transcript will be reported.\nTranscripts are prioritized (from higher to lower) as follows:"),(0,l.kt)("ol",null,(0,l.kt)("li",{parentName:"ol"},"Ensembl Transcript has mane_select column true from source (gnomAD)."),(0,l.kt)("li",{parentName:"ol"},"Transcript is marked as Ensembl canonical in Illumina Connected Annotation cache data."),(0,l.kt)("li",{parentName:"ol"},"RefSeq transcript has mane_select column true."),(0,l.kt)("li",{parentName:"ol"},"Transcript is marked as RefSeq canonical in Illumina Connected Annotation cache data."),(0,l.kt)("li",{parentName:"ol"},"Transcript has the lowest lof.oe_ci.upper value compare to other transcript for the same gene.")),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Differences with gnomAD browser")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Due to difference in Ensembl version between Illumina Connected Annotation and gnomAD, there are several transcript ID that are marked as canonical in gnomAD browser but not in Illumina Connected Analysis.\nIf this is the case, the gene score shown in Illumina Connected Annotation will be different compared to the gene score shown in the gnomAD browser.\nThe ",(0,l.kt)("inlineCode",{parentName:"p"},"transcriptId")," field in the JSON output will report which transcript was used by Illumina Connected Annotation."))),(0,l.kt)("h4",{id:"tab-delimited-file-example"},"Tab delimited file example"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"gene transcript mane_select lof_hc_lc.obs lof_hc_lc.exp lof_hc_lc.possible lof_hc_lc.oe lof_hc_lc.mu lof_hc_lc.pLI lof_hc_lc.pNull lof_hc_lc.pRec lof.obs lof.exp lof.possible lof.oe lof.mu lof.pLI lof.pNull lof.pRec lof.oe_ci.lower lof.oe_ci.upper lof.z_raw lof.z_score mis.obs mis.exp mis.possible mis.oe mis.mu mis.oe_ci.lower mis.oe_ci.upper mis.z_raw mis.z_score mis_pphen.obs mis_pphen.exp mis_pphen.possible mis_pphen.oe syn.obs syn.exp syn.possible syn.oe syn.mu syn.oe_ci.lower syn.oe_ci.upper syn.z_raw syn.z_score constraint_flags\nSCHIP1 ENST00000445224 false 8 3.0392e+01 157 2.6323e-01 3.5111e-07 9.9024e-01 5.8227e-06 9.7579e-03 8 3.0392e+01 157 2.6323e-01 3.5111e-07 9.9066e-01 5.3097e-06 9.3341e-03 1.5300e-01 4.7500e-01 4.0617e+00 3.4377e+00 193 3.0914e+02 1659 6.2431e-01 1.5780e-06 5.5400e-01 7.0300e-01 6.6055e+00 2.4115e+00 87 1.4959e+02 813 5.8160e-01 76 1.0011e+02 393 7.5914e-01 7.9269e-07 6.3000e-01 9.1900e-01 2.4099e+00 1.3142e+00 []\n")),(0,l.kt)("h4",{id:"json-key-to-tsv-column-mapping"},"JSON key to TSV column mapping"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"JSON key"),(0,l.kt)("th",{parentName:"tr",align:null},"TSV column"),(0,l.kt)("th",{parentName:"tr",align:null},"Description"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pLi"),(0,l.kt)("td",{parentName:"tr",align:null},"lof.pLI"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pNull"),(0,l.kt)("td",{parentName:"tr",align:null},"lof.pNull"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being completely tolerant of loss of function variation (observed = expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pRec"),(0,l.kt)("td",{parentName:"tr",align:null},"lof.pRec"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"synZ"),(0,l.kt)("td",{parentName:"tr",align:null},"syn.z_score"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected synonymous Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"misZ"),(0,l.kt)("td",{parentName:"tr",align:null},"mis.z_score"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected missense Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"loeuf"),(0,l.kt)("td",{parentName:"tr",align:null},"lof.oe_ci.upper"),(0,l.kt)("td",{parentName:"tr",align:null},"loss of function observed/expected upper bound fraction (LOEUF)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"transcriptId"),(0,l.kt)("td",{parentName:"tr",align:null},"transcript"),(0,l.kt)("td",{parentName:"tr",align:null},"transcript ID which the values we select")))),(0,l.kt)(m.default,{mdxType:"JSONG40"}),(0,l.kt)("h3",{id:"structural-variants"},"Structural Variants"),(0,l.kt)("p",null,"Structural variants in gnomAD 4.0 is available in VCF format and has the same population data as small variants."),(0,l.kt)("h4",{id:"structural-variant-type-mapping"},"Structural Variant Type Mapping"),(0,l.kt)("p",null,"The source files represented the structural variants with keys using various naming conventions.\nIn the Illumina Connected Annotations JSON output, these keys will be mapped according to the following."),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations JSON SV Type Key"),(0,l.kt)("th",{parentName:"tr",align:null},"GRCh37 Source SV Type Key"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"deletion"),(0,l.kt)("td",{parentName:"tr",align:null},"DEL, CN=0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"duplication"),(0,l.kt)("td",{parentName:"tr",align:null},"DUP")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"inversion"),(0,l.kt)("td",{parentName:"tr",align:null},"INV")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:ALU")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:LINE1")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion"),(0,l.kt)("td",{parentName:"tr",align:null},"INS:ME:SVA")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"complex_structural_alteration"),(0,l.kt)("td",{parentName:"tr",align:null},"CPX")))),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"gnomAD Copy Number Variation")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"In gnomAD 4.0 structural variants data, there are CNV data in the VCF file. Since it is not shown in the browser, we don't include CNV in our output.\nWe will evaluate in the future whether to include copy number variation from structural variation data together with new rare CNV data taht is available in gnomAD 4.0."))),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"gnomAD duplication variant type")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"In gnomAD 4.0 structural variants VCF, only DUP is shown as symbolic allele for duplication variant type.\nBased on the information in gnomAD browser, duplication variant that has split read or paired end reads evidence can be inferred as tandem duplication.\nWith this, we check the evidence data in each DUP variants entry to decide whether it can be assign tandem duplication as variant type or it is just duplication."))),(0,l.kt)("h4",{id:"json-output-1"},"JSON output"),(0,l.kt)(s.default,{mdxType:"JSONSV40"}),(0,l.kt)("h2",{id:"gnomad-v21-grch37"},"gnomAD v2.1 (GRCh37)"),(0,l.kt)("h3",{id:"small-variants-1"},"Small Variants"),(0,l.kt)("h4",{id:"vcf-extraction-1"},"VCF extraction"),(0,l.kt)("p",null,"We currently extract the following info fields from gnomAD genome and exome VCF files:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},'##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n')),(0,l.kt)("p",null,"We also extract the following extra fields from gnomAD exome VCF file:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},'##INFO=\n##INFO=\n##INFO=\n')),(0,l.kt)("h4",{id:"computation"},"Computation"),(0,l.kt)("p",null,"Using these, we compute the following:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"Coverage"),(0,l.kt)("li",{parentName:"ul"},"Allele count, Homozygous count, allele number and allele frequencies for:"),(0,l.kt)("li",{parentName:"ul"},"Global population"),(0,l.kt)("li",{parentName:"ul"},"African/African Americans"),(0,l.kt)("li",{parentName:"ul"},"Admixed Americans"),(0,l.kt)("li",{parentName:"ul"},"Ashkenazi Jews"),(0,l.kt)("li",{parentName:"ul"},"East Asians"),(0,l.kt)("li",{parentName:"ul"},"Finnish"),(0,l.kt)("li",{parentName:"ul"},"Non-Finnish Europeans"),(0,l.kt)("li",{parentName:"ul"},"South Asian"),(0,l.kt)("li",{parentName:"ul"},"Others (population not assigned)"),(0,l.kt)("li",{parentName:"ul"},"Male"),(0,l.kt)("li",{parentName:"ul"},"Female"),(0,l.kt)("li",{parentName:"ul"},"Controls")),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"Note")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("ul",{parentName:"div"},(0,l.kt)("li",{parentName:"ul"},"Coverage = DP / AN. Frequencies are computed using AC/AN for each population."),(0,l.kt)("li",{parentName:"ul"},"Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD."),(0,l.kt)("li",{parentName:"ul"},"Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.")))),(0,l.kt)("h4",{id:"merging-genomes-and-exomes"},"Merging genomes and exomes"),(0,l.kt)("p",null,"When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets."),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"info")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("ul",{parentName:"div"},(0,l.kt)("li",{parentName:"ul"},"For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.")))),(0,l.kt)("h4",{id:"filters"},"Filters"),(0,l.kt)("p",null,"The following strategy will be used when there's a conflict in filter status:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"center"}),(0,l.kt)("th",{parentName:"tr",align:"center"},(0,l.kt)("strong",{parentName:"th"},"Genomes PASS")),(0,l.kt)("th",{parentName:"tr",align:"center"},(0,l.kt)("strong",{parentName:"th"},"Genomes Filtered")))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"center"},(0,l.kt)("strong",{parentName:"td"},"Exomes PASS")),(0,l.kt)("td",{parentName:"tr",align:"center"},"PASS"),(0,l.kt)("td",{parentName:"tr",align:"center"},"Only use exome data")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"center"},(0,l.kt)("strong",{parentName:"td"},"Exomes Filtered")),(0,l.kt)("td",{parentName:"tr",align:"center"},"Only use genome data"),(0,l.kt)("td",{parentName:"tr",align:"center"},"Filtered")))),(0,l.kt)("h4",{id:"vcf-download-instructions"},"VCF download instructions"),(0,l.kt)("p",null,(0,l.kt)("a",{parentName:"p",href:"https://gnomad.broadinstitute.org/downloads"},"https://gnomad.broadinstitute.org/downloads")),(0,l.kt)("h4",{id:"json-output-2"},"JSON output"),(0,l.kt)(r.default,{mdxType:"JSONV"}),(0,l.kt)("h5",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,l.kt)("p",null,"The gnomAD ",(0,l.kt)("inlineCode",{parentName:"p"},".nsa")," for Illumina Connected Annotations can be built using the ",(0,l.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,l.kt)("inlineCode",{parentName:"p"},"gnomad")," subcommand. We will describe building gnomAD version 3.1 here."),(0,l.kt)("h5",{id:"source-data-files"},"Source data files"),(0,l.kt)("p",null,"Input VCF files (one per chromosome) and a ",(0,l.kt)("inlineCode",{parentName:"p"},".version")," file are required in a folder to build the ",(0,l.kt)("inlineCode",{parentName:"p"},".nsa")," file. For example, my directory contains:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"chr10.vcf.bgz chr22.vcf.bgz\nchr11.vcf.bgz chr2.vcf.bgz\nchr12.vcf.bgz chr3.vcf.bgz\nchr13.vcf.bgz chr4.vcf.bgz\nchr14.vcf.bgz chr5.vcf.bgz\nchr15.vcf.bgz chr6.vcf.bgz\nchr16.vcf.bgz chr7.vcf.bgz\nchr17.vcf.bgz chr8.vcf.bgz\nchr18.vcf.bgz chr9.vcf.bgz\nchr19.vcf.bgz chrM.vcf.bgz\nchr1.vcf.bgz chrX.vcf.bgz\nchr20.vcf.bgz chrY.vcf.bgz\nchr21.vcf.bgz gnomad.r3.1.version\n")),(0,l.kt)("p",null,"The version file is a text file with the following content."),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"NAME=gnomAD\nVERSION=3.1\nDATE=2020-10-29\nDESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)\n")),(0,l.kt)("p",null,"The help menu for the utility is as follows:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"SAUtils.dll gnomad\n---------------------------------------------------------------------------\nSAUtils (c) 2021 Illumina, Inc.\nStromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll gnomad [options]\nReads provided supplementary data files and populates tsv files\n\nOPTIONS:\n --ref, -r compressed reference sequence file\n --genome, -g input directory containing VCF (and .version)\n files with genomic frequencies\n --exome, -e input directory containing VCF (and .version)\n files with exomic frequencies\n --temp, -t output temp directory for intermediate (per chrom)\n NSA files\n --out, -o output directory for NSA file\n --help, -h displays the help menu\n --version, -v displays the version\n")),(0,l.kt)("p",null,"Here is a sample execution:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll Gnomad \\\\\n--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\\\\n--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp\n")),(0,l.kt)("h3",{id:"lof-gene-metrics-1"},"LoF Gene Metrics"),(0,l.kt)("h4",{id:"tab-delimited-file-example-1"},"Tab delimited file example"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position\nMED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643\n")),(0,l.kt)("h4",{id:"json-key-to-tsv-column-mapping-1"},"JSON key to TSV column mapping"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"JSON key"),(0,l.kt)("th",{parentName:"tr",align:null},"TSV column"),(0,l.kt)("th",{parentName:"tr",align:null},"Description"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pLi"),(0,l.kt)("td",{parentName:"tr",align:null},"pLI"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pNull"),(0,l.kt)("td",{parentName:"tr",align:null},"pNull"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being completely tolerant of loss of function variation (observed = expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"pRec"),(0,l.kt)("td",{parentName:"tr",align:null},"pRec"),(0,l.kt)("td",{parentName:"tr",align:null},"probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"synZ"),(0,l.kt)("td",{parentName:"tr",align:null},"syn_z"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected synonymous Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"misZ"),(0,l.kt)("td",{parentName:"tr",align:null},"mis_z"),(0,l.kt)("td",{parentName:"tr",align:null},"corrected missense Z score")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"loeuf"),(0,l.kt)("td",{parentName:"tr",align:null},"oe_lof_upper"),(0,l.kt)("td",{parentName:"tr",align:null},"loss of function observed/expected upper bound fraction (LOEUF)")))),(0,l.kt)("h4",{id:"gene-symbol-update"},"Gene symbol update"),(0,l.kt)("p",null,"The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry."),(0,l.kt)("h4",{id:"conflict-resolution"},"Conflict resolution"),(0,l.kt)("p",null,"gnomAD uses Ensembl GeneID as unique identifiers in the ",(0,l.kt)("a",{parentName:"p",href:"https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz"},"source file")," but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict."),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"MDGA2 ENST00000426342 306 4.0043e+02 7.6419e-01 2.1096e-05 4724 78 1.6525e+02 4.7202e-01 1923 125 1.3737e+02 9.0993e-01 7.1973e-06 1413 4 2.0926e-06 453 3.8316e+01 9.9922e-01 8.6490e-12 7.8128e-04 1.0440e-01 7.8600e-01 1.0560e+00 6.9500e-01 8.4000e-01 5.0000e-02 2.3900e-01 8.2988e-01 1.6769e+00 5.1372e+00 1529 0 0 7 2.8103e-05 4.0317e-06 124784 7 0 124791 2.8047e-05 9.8167e-05 0.0000e+00 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 3.5391e-05 1.6672e-04 3.2680e-05 0.0000e+00 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 3.5308e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000139915 2 2181 13 protein_coding 835332 9.9322e-01 3 2.7833e+01 1.0779e-01 NA 14 47308826 48144157\nMDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999\n")),(0,l.kt)("p",null,'In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:'),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"right"},"LOEUF decile"),(0,l.kt)("th",{parentName:"tr",align:"right"},"Haplo-insufficient"),(0,l.kt)("th",{parentName:"tr",align:"right"},"Autosomal Dominant"),(0,l.kt)("th",{parentName:"tr",align:"right"},"Autosomal Recessive"),(0,l.kt)("th",{parentName:"tr",align:"right"},"Olfactory Genes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"0-10%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"104"),(0,l.kt)("td",{parentName:"tr",align:"right"},"140"),(0,l.kt)("td",{parentName:"tr",align:"right"},"36"),(0,l.kt)("td",{parentName:"tr",align:"right"},"0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"10-20%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"47"),(0,l.kt)("td",{parentName:"tr",align:"right"},"128"),(0,l.kt)("td",{parentName:"tr",align:"right"},"72"),(0,l.kt)("td",{parentName:"tr",align:"right"},"1")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"20-30%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"17"),(0,l.kt)("td",{parentName:"tr",align:"right"},"86"),(0,l.kt)("td",{parentName:"tr",align:"right"},"112"),(0,l.kt)("td",{parentName:"tr",align:"right"},"0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"30-40%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"8"),(0,l.kt)("td",{parentName:"tr",align:"right"},"80"),(0,l.kt)("td",{parentName:"tr",align:"right"},"173"),(0,l.kt)("td",{parentName:"tr",align:"right"},"4")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"40-50%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"7"),(0,l.kt)("td",{parentName:"tr",align:"right"},"65"),(0,l.kt)("td",{parentName:"tr",align:"right"},"206"),(0,l.kt)("td",{parentName:"tr",align:"right"},"8")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"50-60%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"4"),(0,l.kt)("td",{parentName:"tr",align:"right"},"54"),(0,l.kt)("td",{parentName:"tr",align:"right"},"207"),(0,l.kt)("td",{parentName:"tr",align:"right"},"6")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"60-70%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"0"),(0,l.kt)("td",{parentName:"tr",align:"right"},"46"),(0,l.kt)("td",{parentName:"tr",align:"right"},"154"),(0,l.kt)("td",{parentName:"tr",align:"right"},"18")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"70-80%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"2"),(0,l.kt)("td",{parentName:"tr",align:"right"},"49"),(0,l.kt)("td",{parentName:"tr",align:"right"},"120"),(0,l.kt)("td",{parentName:"tr",align:"right"},"49")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"80-90%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"0"),(0,l.kt)("td",{parentName:"tr",align:"right"},"34"),(0,l.kt)("td",{parentName:"tr",align:"right"},"58"),(0,l.kt)("td",{parentName:"tr",align:"right"},"96")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"right"},"90-100%"),(0,l.kt)("td",{parentName:"tr",align:"right"},"0"),(0,l.kt)("td",{parentName:"tr",align:"right"},"26"),(0,l.kt)("td",{parentName:"tr",align:"right"},"40"),(0,l.kt)("td",{parentName:"tr",align:"right"},"174")))),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Note")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("ul",{parentName:"div"},(0,l.kt)("li",{parentName:"ul"},"Table source: ",(0,l.kt)("a",{parentName:"li",href:"https://www.biorxiv.org/content/biorxiv/early/2019/01/28/531210.full-text.pdf"},"https://www.biorxiv.org/content/biorxiv/early/2019/01/28/531210.full-text.pdf")),(0,l.kt)("li",{parentName:"ul"},"This table indicates that lower LOEUF scores have more deleterious effect on genes."),(0,l.kt)("li",{parentName:"ul"},"Only 15 out of 19685 genes have conflicting entries.")))),(0,l.kt)("p",null,(0,l.kt)("strong",{parentName:"p"},"List of genes with conflicting entries")),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},'MDGA2:\n {"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}\n {"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}\nCRYBG3:\n {"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}\n {"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}\nCHTF8:\n {"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}\n {"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}\nSEPT1:\n {"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}\n {"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}\nARL14EPL:\n {"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}\n {"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}\nUGT2A1:\n {"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}\n {"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}\nLTB4R2:\n {"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}\n {"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}\nCDRT1:\n {"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}\n {"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}\nMUC3A:\n {"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}\n {"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}\nCOG8:\n {"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}\n {"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}\nAC006486.1:\n {"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}\n {"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}\nAL645922.1:\n {"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}\n {"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}\nNBPF20:\n {"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}\n {"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}\nPRAMEF11:\n {"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}\n {"synZ":-3.33e0,"misZ":-2.59e0}\nFAM231D:\n {"synZ":-1.98e0,"misZ":-1.44e0}\n {"synZ":1.07e0,"misZ":3.13e-1}\n')),(0,l.kt)("p",null,(0,l.kt)("strong",{parentName:"p"},"Conflict resolution")),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"Pick the entry with the lowest LOEUF score"),(0,l.kt)("li",{parentName:"ul"},"If the same, pick the lowest pLI"),(0,l.kt)("li",{parentName:"ul"},"Otherwise pick the entry with the max absolute value of synZ + misZ")),(0,l.kt)("h4",{id:"download-url"},"Download URL"),(0,l.kt)("p",null,(0,l.kt)("a",{parentName:"p",href:"https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz"},"https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz")),(0,l.kt)("h4",{id:"json-output-3"},"JSON output"),(0,l.kt)(o.default,{mdxType:"JSONG"}),(0,l.kt)("h3",{id:"structural-variants-1"},"Structural Variants"),(0,l.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. ",(0,l.kt)("em",{parentName:"p"},"Nature")," ",(0,l.kt)("strong",{parentName:"p"},"581"),", pp.444\u2013451. ",(0,l.kt)("a",{parentName:"p",href:"https://doi.org/10.1038/s41586-020-2287-8"},"https://doi.org/10.1038/s41586-020-2287-8")))),(0,l.kt)("p",null,(0,l.kt)("strong",{parentName:"p"},"Note"),"\nThe gnomAD structural variant annotations are in a preview stage at the moment.\nCurrently, the annotations do not include translocation breakends.\nFuture updates will include a better way of annotating the structural variants."),(0,l.kt)("h4",{id:"source-files"},"Source Files"),(0,l.kt)(u.default,{mdxType:"SVDATADESCRIPTION"}),(0,l.kt)("h4",{id:"download-urls"},"Download URLs"),(0,l.kt)("h5",{id:"grch37"},"GRCh37"),(0,l.kt)("p",null,"The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:"),(0,l.kt)("p",null,(0,l.kt)("a",{parentName:"p",href:"https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz"},"https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz")),(0,l.kt)("h4",{id:"json-output-4"},"JSON output"),(0,l.kt)(p.default,{mdxType:"JSONSV"}))}A.isMDXComponent=!0},3274:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad4.0-lof-json",id:"version-3.24/data-sources/gnomad4.0-lof-json",title:"gnomad4.0-lof-json",description:"",source:"@site/versioned_docs/version-3.24/data-sources/gnomad4.0-lof-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad4.0-lof-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-lof-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad4.0-lof-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD": {\n "pLi": 0.00000122,\n "pRec": 0.32,\n "pNull": 0.68,\n "synZ": 0.0117,\n "misZ": 0.162,\n "loeuf": 1.94,\n "transcriptId": "ENST00000360525"\n}\n')))}u.isMDXComponent=!0},9043:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad4.0-small-variants-json",id:"version-3.24/data-sources/gnomad4.0-small-variants-json",title:"gnomad4.0-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad4.0-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad4.0-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad4.0-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad": {\n "coverage": 154,\n "failedFilter": true,\n "allAf": 0.5,\n "allAn": 152428,\n "allAc": 76214,\n "allHc": 0,\n "afrAf": 0.5,\n "afrAn": 41608,\n "afrAc": 20804,\n "afrHc": 0,\n "amiAf": 0.5,\n "amiAn": 912,\n "amiAc": 456,\n "amiHc": 0,\n "amrAf": 0.5,\n "amrAn": 15314,\n "amrAc": 7657,\n "amrHc": 0,\n "easAf": 0.5,\n "easAn": 5196,\n "easAc": 2598,\n "easHc": 0,\n "finAf": 0.5,\n "finAn": 10632,\n "finAc": 5316,\n "finHc": 0,\n "nfeAf": 0.5,\n "nfeAn": 68050,\n "nfeAc": 34025,\n "nfeHc": 0,\n "asjAf": 0.5,\n "asjAn": 3472,\n "asjAc": 1736,\n "asjHc": 0,\n "sasAf": 0.5,\n "sasAn": 4834,\n "sasAc": 2417,\n "sasHc": 0,\n "midAf": 0.5,\n "midAn": 294,\n "midAc": 147,\n "midHc": 0,\n "remainingAf": 0.5,\n "remainingAn": 2116,\n "remainingAc": 1058,\n "remainingHc": 0,\n "maleAf": 0.5,\n "maleAn": 74544,\n "maleAc": 37272,\n "maleHc": 0,\n "femaleAf": 0.5,\n "femaleAn": 77884,\n "femaleAc": 38942,\n "femaleHc": 0\n}\n')),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad-exome": {\n "coverage": 53,\n "allAf": 0.495074,\n "allAn": 4060,\n "allAc": 2010,\n "allHc": 11,\n "afrAf": 0.5,\n "afrAn": 86,\n "afrAc": 43,\n "afrHc": 0,\n "amrAf": 0.5,\n "amrAn": 46,\n "amrAc": 23,\n "amrHc": 0,\n "easAf": 0.491071,\n "easAn": 112,\n "easAc": 55,\n "easHc": 0,\n "finAf": 0.5,\n "finAn": 306,\n "finAc": 153,\n "finHc": 0,\n "nfeAf": 0.49503,\n "nfeAn": 3018,\n "nfeAc": 1494,\n "nfeHc": 11,\n "asjAf": 0.461538,\n "asjAn": 26,\n "asjAc": 12,\n "asjHc": 0,\n "sasAf": 0.486111,\n "sasAn": 72,\n "sasAc": 35,\n "sasHc": 0,\n "midAf": 0.5,\n "midAn": 68,\n "midAc": 34,\n "midHc": 0,\n "remainingAf": 0.493865,\n "remainingAn": 326,\n "remainingAc": 161,\n "remainingHc": 0,\n "maleAf": 0.495212,\n "maleAn": 2924,\n "maleAc": 1448,\n "maleHc": 9,\n "femaleAf": 0.494718,\n "femaleAn": 1136,\n "femaleAc": 562,\n "femaleHc": 2\n}\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"coverage"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"average coverage (non-negative integer values)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for male population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for male population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for female population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for female population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Other population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Other population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Other population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Other population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the African / African American population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the African / African American population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the African / African American population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for African / African American population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for Amish populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for Amish populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for Amish populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Amish populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Latino population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Latino population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Latino population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Latino population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for East Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Finnish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Finnish population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Finnish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Finnish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Non-Finnish European population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Non-Finnish European population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Non-Finnish European population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Non-Finnish European population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Ashkenazi Jewish population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Ashkenazi Jewish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the South Asian population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the South Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the South Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Middle Eastern population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the iddle Eastern population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the iddle Eastern population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the iddle Eastern population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")))))}u.isMDXComponent=!0},4404:(t,e,n)=>{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=n(7462),l=(n(7294),n(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad40-structural-variants-json",id:"version-3.24/data-sources/gnomad40-structural-variants-json",title:"gnomad40-structural-variants-json",description:"",source:"@site/versioned_docs/version-3.24/data-sources/gnomad40-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad40-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad40-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad40-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},s="wrapper";function u(t){let{components:e,...n}=t;return(0,l.kt)(s,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad": [\n {\n "chromosome": "1",\n "begin": 1769047,\n "end": 78686496,\n "variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",\n "variantType": "complex_structural_alteration",\n "failedFilter": true,\n "allAf": 0.51192,\n "afrAf": 0.491986,\n "amiAf": 0.559382,\n "amrAf": 0.499444,\n "asjAf": 0.505975,\n "easAf": 0.51924,\n "midAf": 0.53125,\n "finAf": 0.542619,\n "nfeAf": 0.521916,\n "othAf": 0.492366,\n "sasAf": 0.516568,\n "femaleAf": 0.509225,\n "maleAf": 0.514861,\n "allAc": 64549,\n "afrAc": 16637,\n "amiAc": 471,\n "amrAc": 6290,\n "asjAc": 1609,\n "easAc": 2105,\n "midAc": 34,\n "finAc": 3514,\n "nfeAc": 30839,\n "othAc": 774,\n "sasAc": 2276,\n "femaleAc": 33507,\n "maleAc": 31042,\n "allAn": 126092,\n "afrAn": 33816,\n "amiAn": 842,\n "amrAn": 12594,\n "asjAn": 3180,\n "easAn": 4054,\n "midAn": 64,\n "finAn": 6476,\n "nfeAn": 59088,\n "othAn": 1572,\n "sasAn": 4406,\n "femaleAn": 65800,\n "maleAn": 60292,\n "allHc": 3167,\n "afrHc": 413,\n "amiHc": 54,\n "amrHc": 238,\n "asjHc": 49,\n "easHc": 97,\n "midHc": 2,\n "finHc": 368,\n "nfeHc": 1807,\n "othHc": 23,\n "sasHc": 116,\n "femaleHc": 1407,\n "maleHc": 1760\n }\n]\n')))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/65781eeb.2f6b3880.js b/assets/js/65781eeb.2f6b3880.js new file mode 100644 index 00000000..de31e887 --- /dev/null +++ b/assets/js/65781eeb.2f6b3880.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3459],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>f});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var d=a.createContext({}),p=function(e){var t=a.useContext(d),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},u=function(e){var t=p(e.components);return a.createElement(d.Provider,{value:t},e.children)},s="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,d=e.parentName,u=l(e,["components","mdxType","originalType","parentName"]),s=p(n),m=r,f=s["".concat(d,".").concat(m)]||s[m]||c[m]||o;return n?a.createElement(f,i(i({ref:t},u),{},{components:n})):a.createElement(f,i({ref:t},u))}));function f(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=m;var l={};for(var d in t)hasOwnProperty.call(t,d)&&(l[d]=t[d]);l.originalType=e,l[s]="string"==typeof e?e:r,i[1]=l;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>s,frontMatter:()=>o,metadata:()=>l,toc:()=>d});var a=n(7462),r=(n(7294),n(3905));const o={title:"Annotation Engine vs Data update"},i=void 0,l={unversionedId:"frequently-asked-questions/Annotator-vs-data-update",id:"version-3.24/frequently-asked-questions/Annotator-vs-data-update",title:"Annotation Engine vs Data update",description:"Background",source:"@site/versioned_docs/version-3.24/frequently-asked-questions/Annotator-vs-data-update.md",sourceDirName:"frequently-asked-questions",slug:"/frequently-asked-questions/Annotator-vs-data-update",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/frequently-asked-questions/Annotator-vs-data-update",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/frequently-asked-questions/Annotator-vs-data-update.md",tags:[],version:"3.24",frontMatter:{title:"Annotation Engine vs Data update"},sidebar:"docs",previous:{title:"SAUtils",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/sautils"}},d=[{value:"Background",id:"background",children:[],level:2},{value:"Annotator update",id:"annotator-update",children:[],level:2},{value:"Data update",id:"data-update",children:[],level:2},{value:"Update scenarios",id:"update-scenarios",children:[],level:2}],p={toc:d},u="wrapper";function s(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"background"},"Background"),(0,r.kt)("p",null,"Update to annotations can be broadly categorized into two categories:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"Annotation engine (Annotator) update."),(0,r.kt)("li",{parentName:"ul"},"Annotation data update.")),(0,r.kt)("p",null,"Understanding the nature of these two types of updates is key when it comes to updating annotation."),(0,r.kt)("h2",{id:"annotator-update"},"Annotator update"),(0,r.kt)("p",null,"The annotator is the engine that contains logic for core annotations such as computing ",(0,r.kt)("inlineCode",{parentName:"p"},"variant consequences"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"HGVS")," notations, mapped positions (e.g. ",(0,r.kt)("inlineCode",{parentName:"p"},"CDNA"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"CDS"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"protein")," positions), detecting ",(0,r.kt)("inlineCode",{parentName:"p"},"gene fusions"),", etc., and perform annotation lookups from external data sources such as ",(0,r.kt)("inlineCode",{parentName:"p"},"dbSNP"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"gnomAD"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"ClinVar"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"OMIM"),", etc. also known as supplementary annotations (SA). Update to the annotator entails new features or bugfixes to the compute or lookup mechanism. This is completely independent of the data update such as updating ",(0,r.kt)("inlineCode",{parentName:"p"},"dbSNP")," from v154 to v155. In other words, the same annotator can annotate with ",(0,r.kt)("inlineCode",{parentName:"p"},"dbSNP v154")," and ",(0,r.kt)("inlineCode",{parentName:"p"},"dbSNP v155")," when provided with the appropriate data files."),(0,r.kt)("h2",{id:"data-update"},"Data update"),(0,r.kt)("p",null,"The annotator uses data from various sources (listed in ",(0,r.kt)("a",{parentName:"p",href:"/IlluminaConnectedAnnotationsDocumentation/3.24/"},"Introduction"),"). For example, gene models used for core annotations are obtained from ",(0,r.kt)("inlineCode",{parentName:"p"},"RefSeq")," and ",(0,r.kt)("inlineCode",{parentName:"p"},"Ensembl"),". Supplementary annotations come from various sources such as ",(0,r.kt)("inlineCode",{parentName:"p"},"dbSNP"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"gnomAD"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"ClinVar"),", ",(0,r.kt)("inlineCode",{parentName:"p"},"OMIM"),", etc. Any of these data can be updated without updating the annotator as long as the file formats are compatible."),(0,r.kt)("h2",{id:"update-scenarios"},"Update scenarios"),(0,r.kt)("p",null,"Let us look at a few update scenarios."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Requirement"),(0,r.kt)("th",{parentName:"tr",align:null},"What needs to be updated /added"),(0,r.kt)("th",{parentName:"tr",align:null},"Suggested action"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"New transcripts and gene symbols"),(0,r.kt)("td",{parentName:"tr",align:null},"Cache files from RefSeq and Ensembl"),(0,r.kt)("td",{parentName:"tr",align:null},"Run Downloader")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Update ClinVar"),(0,r.kt)("td",{parentName:"tr",align:null},"ClinVar SA files"),(0,r.kt)("td",{parentName:"tr",align:null},"Run Downloader")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"New external annotation"),(0,r.kt)("td",{parentName:"tr",align:null},"New SA files required"),(0,r.kt)("td",{parentName:"tr",align:null},"Submit feature request")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"New annotation feature"),(0,r.kt)("td",{parentName:"tr",align:null},"Annotator"),(0,r.kt)("td",{parentName:"tr",align:null},"Submit feature request")))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/669469dc.451ddf08.js b/assets/js/669469dc.451ddf08.js new file mode 100644 index 00000000..bc219587 --- /dev/null +++ b/assets/js/669469dc.451ddf08.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3413,3555],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},c=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},u="mdxType",p={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,s=e.parentName,c=l(e,["components","mdxType","originalType","parentName"]),u=d(n),m=r,v=u["".concat(s,".").concat(m)]||u[m]||p[m]||o;return n?a.createElement(v,i(i({ref:t},c),{},{components:n})):a.createElement(v,i({ref:t},c))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=m;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[u]="string"==typeof e?e:r,i[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>u,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/dann-json",id:"version-3.24/data-sources/dann-json",title:"dann-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dann-json.md",sourceDirName:"data-sources",slug:"/data-sources/dann-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dann-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},c="wrapper";function u(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"dannScore": 0.27\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"dannScore"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1.0")))))}u.isMDXComponent=!0},3612:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>p,frontMatter:()=>i,metadata:()=>s,toc:()=>d});var a=n(7462),r=(n(7294),n(3905)),o=n(7476);const i={title:"DANN"},l=void 0,s={unversionedId:"data-sources/dann",id:"version-3.24/data-sources/dann",title:"DANN",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/dann.mdx",sourceDirName:"data-sources",slug:"/data-sources/dann",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dann.mdx",tags:[],version:"3.24",frontMatter:{title:"DANN"},sidebar:"docs",previous:{title:"COSMIC",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic"},next:{title:"dbSNP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"TSV File",id:"tsv-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"GRCh38 liftover",id:"grch38-liftover",children:[],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],c={toc:d},u="wrapper";function p(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN).\nCADD is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms.\nDANN improves on CADD (which uses Support Vector Machines (SVMs)) by capturing non-linear relationships by using a deep neural network instead of SVMs.\nDANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD\u2019s SVM methodology."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Quang, Daniel, Yifei Chen, and Xiaohui Xie. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. ",(0,r.kt)("em",{parentName:"p"},"Bioinformatics")," ",(0,r.kt)("strong",{parentName:"p"},"31.5")," 761-763 (2015). ",(0,r.kt)("a",{parentName:"p",href:"https://doi.org/10.1093/bioinformatics/btu703"},"https://doi.org/10.1093/bioinformatics/btu703")))),(0,r.kt)("h2",{id:"tsv-file"},"TSV File"),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-tsv"},"chr grch37_pos ref alt DANN\n1 10001 T A 0.16461391399220135\n1 10001 T C 0.4396994049749739\n1 10001 T G 0.38108629377072734\n1 10002 A C 0.36182020272810128\n1 10002 A G 0.44413258111779291\n1 10002 A T 0.16812846819989813\n")),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"From the CSV file, we are interested in all columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"chr")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"grch37_pos")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"ref")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"alt")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"DANN"))),(0,r.kt)("h2",{id:"grch38-liftover"},"GRCh38 liftover"),(0,r.kt)("p",null,"The data is not available for GRCh38 on DANN website. We performed a liftover from GRCh37 to GRCh38 using crossmap."),(0,r.kt)("h2",{id:"known-issues"},"Known Issues"),(0,r.kt)("p",null,"None"),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://cbcl.ics.uci.edu/public_data/DANN/"},"https://cbcl.ics.uci.edu/public_data/DANN/")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(o.default,{mdxType:"JSON"}))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/67233f3f.7ade2c7c.js b/assets/js/67233f3f.7ade2c7c.js new file mode 100644 index 00000000..f8fe5dbd --- /dev/null +++ b/assets/js/67233f3f.7ade2c7c.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9896],{3905:(e,t,n)=>{n.d(t,{Zo:()=>s,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function l(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var i=r.createContext({}),p=function(e){var t=r.useContext(i),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},s=function(e){var t=p(e.components);return r.createElement(i.Provider,{value:t},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},d=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,i=e.parentName,s=c(e,["components","mdxType","originalType","parentName"]),u=p(n),d=a,f=u["".concat(i,".").concat(d)]||u[d]||m[d]||o;return n?r.createElement(f,l(l({ref:t},s),{},{components:n})):r.createElement(f,l({ref:t},s))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,l=new Array(o);l[0]=d;var c={};for(var i in t)hasOwnProperty.call(t,i)&&(c[i]=t[i]);c.originalType=e,c[u]="string"==typeof e?e:a,l[1]=c;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>u,frontMatter:()=>o,metadata:()=>c,toc:()=>i});var r=n(7462),a=(n(7294),n(3905));const o={},l=void 0,c={unversionedId:"data-sources/gerp-json",id:"version-3.24/data-sources/gerp-json",title:"gerp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gerp-json.md",sourceDirName:"data-sources",slug:"/data-sources/gerp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gerp-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],p={toc:i},s="wrapper";function u(e){let{components:t,...n}=e;return(0,a.kt)(s,(0,r.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"gerpScore": 1.27\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"gerpScore"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"Range: -\u221e to +\u221e")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/68ae1648.dd12f9da.js b/assets/js/68ae1648.dd12f9da.js new file mode 100644 index 00000000..e49a8bf0 --- /dev/null +++ b/assets/js/68ae1648.dd12f9da.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6573],{3905:(e,t,n)=>{n.d(t,{Zo:()=>s,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function i(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var p=r.createContext({}),c=function(e){var t=r.useContext(p),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},s=function(e){var t=c(e.components);return r.createElement(p.Provider,{value:t},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},d=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,p=e.parentName,s=l(e,["components","mdxType","originalType","parentName"]),u=c(n),d=a,f=u["".concat(p,".").concat(d)]||u[d]||m[d]||o;return n?r.createElement(f,i(i({ref:t},s),{},{components:n})):r.createElement(f,i({ref:t},s))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,i=new Array(o);i[0]=d;var l={};for(var p in t)hasOwnProperty.call(t,p)&&(l[p]=t[p]);l.originalType=e,l[u]="string"==typeof e?e:a,i[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>u,frontMatter:()=>o,metadata:()=>l,toc:()=>p});var r=n(7462),a=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/phylopprimate-json",id:"version-3.24/data-sources/phylopprimate-json",title:"phylopprimate-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/phylopprimate-json.md",sourceDirName:"data-sources",slug:"/data-sources/phylopprimate-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylopprimate-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/phylopprimate-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],c={toc:p},s="wrapper";function u(e){let{components:t,...n}=e;return(0,a.kt)(s,(0,r.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json",metastring:"{11}","{11}":!0},' "variants": [\n {\n "vid": "1-64927-G-T",\n "chromosome": "chr1",\n "begin": 64927,\n "end": 64927,\n "refAllele": "G",\n "altAllele": "T",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.64927G>T",\n "phyloPPrimateScore": 0.151\n }\n]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"phyloPPrimateScore"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: -20 to 1.951")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/6bb3eb16.e2fa8230.js b/assets/js/6bb3eb16.e2fa8230.js new file mode 100644 index 00000000..06cc2c41 --- /dev/null +++ b/assets/js/6bb3eb16.e2fa8230.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2612],{3905:(t,e,n)=>{n.d(e,{Zo:()=>c,kt:()=>g});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function o(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var p=r.createContext({}),u=function(t){var e=r.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},c=function(t){var e=u(t.components);return r.createElement(p.Provider,{value:e},t.children)},d="mdxType",m={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},s=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,l=t.originalType,p=t.parentName,c=i(t,["components","mdxType","originalType","parentName"]),d=u(n),s=a,g=d["".concat(p,".").concat(s)]||d[s]||m[s]||l;return n?r.createElement(g,o(o({ref:e},c),{},{components:n})):r.createElement(g,o({ref:e},c))}));function g(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var l=n.length,o=new Array(l);o[0]=s;var i={};for(var p in e)hasOwnProperty.call(e,p)&&(i[p]=e[p]);i.originalType=t,i[d]="string"==typeof t?t:a,o[1]=i;for(var u=2;u{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>d,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var r=n(7462),a=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/decipher-json",id:"version-3.24/data-sources/decipher-json",title:"decipher-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/decipher-json.md",sourceDirName:"data-sources",slug:"/data-sources/decipher-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/decipher-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},c="wrapper";function d(t){let{components:e,...n}=t;return(0,a.kt)(c,(0,r.Z)({},u,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"decipher":[\n {\n "chromosome":"1",\n "begin":13516,\n "end":91073,\n "numDeletions":27,\n "deletionFrequency":0.675,\n "numDuplications":27,\n "duplicationFrequency":0.675,\n "sampleSize":40,\n "reciprocalOverlap": 0.27555,\n "annotationOverlap": 0.5901\n }\n],\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"begin"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"end"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"numDeletions"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"# of observed deletions")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"deletionFrequency"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"deletion frequency")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"numDuplications"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"# of observed duplications")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"duplicationFrequency"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"duplication frequency")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"sampleSize"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"total # of samples")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/6da9a512.2a69647b.js b/assets/js/6da9a512.2a69647b.js new file mode 100644 index 00000000..05b0df01 --- /dev/null +++ b/assets/js/6da9a512.2a69647b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2453,7283],{3905:(e,n,t)=>{t.d(n,{Zo:()=>c,kt:()=>g});var a=t(7294);function i(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function r(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,a)}return t}function l(e){for(var n=1;n=0||(i[t]=e[t]);return i}(e,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(i[t]=e[t])}return i}var o=a.createContext({}),p=function(e){var n=a.useContext(o),t=n;return e&&(t="function"==typeof e?e(n):l(l({},n),e)),t},c=function(e){var n=p(e.components);return a.createElement(o.Provider,{value:n},e.children)},m="mdxType",d={inlineCode:"code",wrapper:function(e){var n=e.children;return a.createElement(a.Fragment,{},n)}},u=a.forwardRef((function(e,n){var t=e.components,i=e.mdxType,r=e.originalType,o=e.parentName,c=s(e,["components","mdxType","originalType","parentName"]),m=p(t),u=i,g=m["".concat(o,".").concat(u)]||m[u]||d[u]||r;return t?a.createElement(g,l(l({ref:n},c),{},{components:t})):a.createElement(g,l({ref:n},c))}));function g(e,n){var t=arguments,i=n&&n.mdxType;if("string"==typeof e||i){var r=t.length,l=new Array(r);l[0]=u;var s={};for(var o in n)hasOwnProperty.call(n,o)&&(s[o]=n[o]);s.originalType=e,s[m]="string"==typeof e?e:i,l[1]=s;for(var p=2;p{t.r(n),t.d(n,{contentTitle:()=>l,default:()=>m,frontMatter:()=>r,metadata:()=>s,toc:()=>o});var a=t(7462),i=(t(7294),t(3905));const r={},l=void 0,s={unversionedId:"data-sources/clinvar-json",id:"version-3.24/data-sources/clinvar-json",title:"clinvar-json",description:"small variants:",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-json.md",sourceDirName:"data-sources",slug:"/data-sources/clinvar-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-json.md",tags:[],version:"3.24",frontMatter:{}},o=[],p={toc:o},c="wrapper";function m(e){let{components:n,...t}=e;return(0,i.kt)(c,(0,a.Z)({},p,t,{components:n,mdxType:"MDXLayout"}),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"small variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "id":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "significance":[\n "benign"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "lastUpdatedDate":"2020-03-01",\n "isAlleleSpecific":true\n },\n {\n "id":"RCV000030258.4",\n "variationId":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "alleleOrigins":[\n "germline"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "phenotypes":[\n "Lynch syndrome"\n ],\n "medGenIds":[\n "C1333990"\n ],\n "omimIds":[\n "120435"\n ],\n "significance":[\n "benign"\n ],\n "lastUpdatedDate":"2017-05-01",\n "isAlleleSpecific":true\n }\n]\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"large variants:")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "chromosome":"1", \n "begin":629025, \n "end":8537745, \n "variantType":"copy_number_loss", \n "id":"RCV000051993.4", \n "variationId":"VCV000058242.1", \n "reviewStatus":"criteria provided, single submitter", \n "alleleOrigins":[\n "not provided"\n ], \n "phenotypes":[\n "See cases"\n ], \n "significance":[\n "pathogenic"\n ], \n "lastUpdatedDate":"2022-04-21", \n "pubMedIds":[\n "21844811"\n ]\n },\n {\n "id":"VCV000058242.1",\n "reviewStatus":"criteria provided, single submitter",\n "significance":[\n "pathogenic"\n ],\n "lastUpdatedDate":"2022-04-21"\n },\n ......\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"id"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"variationId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"ClinVar VCV ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"variant type")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"reviewStatus"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"alleleOrigins"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"medGenIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"MedGen IDs")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"omimIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"OMIM IDs")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"orphanetIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Orphanet IDs")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"significance"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"lastUpdatedDate"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,i.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the ClinVar alternate allele")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"reviewStatus:")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"no assertion provided"),(0,i.kt)("li",{parentName:"ul"},"no assertion criteria provided"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, single submitter"),(0,i.kt)("li",{parentName:"ul"},"practice guideline"),(0,i.kt)("li",{parentName:"ul"},"classified by multiple submitters"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, conflicting interpretations"),(0,i.kt)("li",{parentName:"ul"},"criteria provided, multiple submitters, no conflicts"),(0,i.kt)("li",{parentName:"ul"},"no interpretation for the single variant")),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"alleleOrigins:")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"unknown"),(0,i.kt)("li",{parentName:"ul"},"other"),(0,i.kt)("li",{parentName:"ul"},"germline"),(0,i.kt)("li",{parentName:"ul"},"somatic"),(0,i.kt)("li",{parentName:"ul"},"inherited"),(0,i.kt)("li",{parentName:"ul"},"paternal"),(0,i.kt)("li",{parentName:"ul"},"maternal"),(0,i.kt)("li",{parentName:"ul"},"de-novo"),(0,i.kt)("li",{parentName:"ul"},"biparental"),(0,i.kt)("li",{parentName:"ul"},"uniparental"),(0,i.kt)("li",{parentName:"ul"},"not-tested"),(0,i.kt)("li",{parentName:"ul"},"tested-inconclusive")),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"significance:")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"uncertain significance"),(0,i.kt)("li",{parentName:"ul"},"not provided"),(0,i.kt)("li",{parentName:"ul"},"benign"),(0,i.kt)("li",{parentName:"ul"},"likely benign"),(0,i.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"pathogenic"),(0,i.kt)("li",{parentName:"ul"},"drug response"),(0,i.kt)("li",{parentName:"ul"},"histocompatibility"),(0,i.kt)("li",{parentName:"ul"},"association"),(0,i.kt)("li",{parentName:"ul"},"risk factor"),(0,i.kt)("li",{parentName:"ul"},"protective"),(0,i.kt)("li",{parentName:"ul"},"affects"),(0,i.kt)("li",{parentName:"ul"},"conflicting data from submitters"),(0,i.kt)("li",{parentName:"ul"},"other"),(0,i.kt)("li",{parentName:"ul"},"no interpretation for the single variant"),(0,i.kt)("li",{parentName:"ul"},"conflicting interpretations of pathogenicity")))}m.isMDXComponent=!0},3697:(e,n,t)=>{t.r(n),t.d(n,{contentTitle:()=>s,default:()=>d,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=t(7462),i=(t(7294),t(3905)),r=t(5666);const l={title:"ClinVar"},s=void 0,o={unversionedId:"data-sources/clinvar",id:"version-3.24/data-sources/clinvar",title:"ClinVar",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/clinvar.mdx",sourceDirName:"data-sources",slug:"/data-sources/clinvar",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar.mdx",tags:[],version:"3.24",frontMatter:{title:"ClinVar"},sidebar:"docs",previous:{title:"ClinGen",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen"},next:{title:"ClinVar Preview",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview"}},p=[{value:"Overview",id:"overview",children:[],level:2},{value:"RCV File",id:"rcv-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[{value:"Parsing Significance",id:"parsing-significance",children:[],level:4}],level:3}],level:2},{value:"VCV File",id:"vcv-file",children:[{value:"Example",id:"example-1",children:[],level:3},{value:"Parsing",id:"parsing-1",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URLs",id:"download-urls",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[{value:"Using clinvar subcommands and source data files",id:"using-clinvar-subcommands-and-source-data-files",children:[],level:3}],level:2}],c={toc:p},m="wrapper";function d(e){let{components:n,...l}=e;return(0,i.kt)(m,(0,a.Z)({},c,l,{components:n,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Deprecated")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"ClinVar has changed to a ",(0,i.kt)("a",{parentName:"p",href:"https://github.com/ncbi/clinvar/blob/master/ClassificationOnClinVar.md"},"new XML format"),"\nUse ",(0,i.kt)("a",{parentName:"p",href:"./clinvar-preview"},"CliVarPreview")," for latest ClinVar entries."))),(0,i.kt)("p",null,"ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, ",(0,i.kt)("em",{parentName:"p"},"Nucleic Acids Research"),", ",(0,i.kt)("strong",{parentName:"p"},"46"),", Issue D1, 4 January 2018, Pages D1062\u2013D1067, ",(0,i.kt)("a",{parentName:"p",href:"https://doi.org/10.1093/nar/gkx1153"},"https://doi.org/10.1093/nar/gkx1153")))),(0,i.kt)("h2",{id:"rcv-file"},"RCV File"),(0,i.kt)("h3",{id:"example"},"Example"),(0,i.kt)("p",null,"Here's ",(0,i.kt)("a",{target:"_blank",href:t(4311).Z},"a full RCV entry"),"."),(0,i.kt)("h3",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output."),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"ID")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{3}","{3}":!0},'\n \n \n\n')),(0,i.kt)("p",null,"The Acc and Version fields are merged to form the ID (RCV000000001.2)"),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"LastUpdatedDate")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{2}","{2}":!0},'\n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Significance")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{5}","{5}":!0},'\n \n \n no assertion criteria provided\n Pathogenic\n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"ReviewStatus")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{4}","{4}":!0},'\n \n \n no assertion criteria provided\n Pathogenic\n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Phenotypes")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{2-8}","{2-8}":!0},'\n \n \n \n Joubert syndrome 9\n \n \n \n\n')),(0,i.kt)("p",null,'We only use the field with Type="Preferred". Multiple phenotypes may be reported'),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Location, Variant Type and Variant Id")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{3-12}","{3-12}":!0},'\n\n \n \n \n \n \n \n \n\n')),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"The variant position is extracted from the fields for their respective assemblies."),(0,i.kt)("li",{parentName:"ul"},"Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant."),(0,i.kt)("li",{parentName:"ul"},'For older records, since "start\' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.'),(0,i.kt)("li",{parentName:"ul"},"If a required allele is not available, we extract it from the reference sequence."),(0,i.kt)("li",{parentName:"ul"},"Only variants having a dbSNP id are extracted."),(0,i.kt)("li",{parentName:"ul"},"Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)"),(0,i.kt)("li",{parentName:"ul"},"VariantId is extracted from the MeasureSet attributes."),(0,i.kt)("li",{parentName:"ul"},"VariantType is extracted from the Measure attributes.",(0,i.kt)("div",{parentName:"li",className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"unsupported variant types")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"We currently don't support the following variant types:"),(0,i.kt)("ul",{parentName:"div"},(0,i.kt)("li",{parentName:"ul"},"Microsatellite"),(0,i.kt)("li",{parentName:"ul"},"protein only"),(0,i.kt)("li",{parentName:"ul"},"fusion"),(0,i.kt)("li",{parentName:"ul"},"Complex"),(0,i.kt)("li",{parentName:"ul"},"Variation"),(0,i.kt)("li",{parentName:"ul"},"Translocation ")))))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"MedGen, OMIM, Orphanet IDs")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{4-7}","{4-7}":!0},'\n \n \n \n \n \n \n \n \n\n')),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"AlleleOrigins")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{2}","{2}":!0},"\n germline\n\n")),(0,i.kt)("p",null,"We only extract all Allele Origins from Submissions (SCV) entries."),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"PubMedIds")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{4,10,16,21}","{4,10,16,21}":!0},'\n \n \n 12114475\n \n \n \n LMM Criteria\n \n 24033266\n \n \n \n \n \n 9113933\n \n \n \n \n 23757202\n \n\n')),(0,i.kt)("p",null,"We only extract all Pubmed Ids from Submissions (SCV) entries."),(0,i.kt)("h4",{id:"parsing-significance"},"Parsing Significance"),(0,i.kt)("p",null,"Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{3,8,13-14}","{3,8,13-14}":!0},'\n no assertion criteria provided\n Pathogenic\n\n\n\n criteria provided, multiple submitters, no conflicts\n Pathogenic/Likely pathogenic\n\n\n\n no assertion criteria provided\n Conflicting interpretations of pathogenicity\n Pathogenic(1);Uncertain significance(1)\n\n')),(0,i.kt)("p",null,"Given the evidence, we converted the significance field into an array of strings which may be parsed out of the ",(0,i.kt)("inlineCode",{parentName:"p"},"Descriptions")," or ",(0,i.kt)("inlineCode",{parentName:"p"},"Explanation")," fields."),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Varying Delimiters")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"The delimiters in each field may vary. Currently, the delimiters for ",(0,i.kt)("inlineCode",{parentName:"p"},"Description")," are ",(0,i.kt)("inlineCode",{parentName:"p"},",")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"/"),". The delimiters for ",(0,i.kt)("inlineCode",{parentName:"p"},"Explanation")," are ",(0,i.kt)("inlineCode",{parentName:"p"},";")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"/"),"."))),(0,i.kt)("h2",{id:"vcv-file"},"VCV File"),(0,i.kt)("h3",{id:"example-1"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n\n\n current\n Homo sapiens\n \n \n \n \n \n 1p36.31\n \n \n \n 601142\n \n \n \n 1p36.31\n \n \n \n 607215\n \n \n GRCh37/hg19 1p36.31(chr1:6051187-6158763)\n copy number gain\n \n 1p36.31\n \n \n \n no interpretation for the single variant\n \n \n \n \n \n \n no interpretation for the single variant\n \n \n no interpretation for the single variant\n \n \n \n \n \n \n \n \n \n\n\n')),(0,i.kt)("h3",{id:"parsing-1"},"Parsing"),(0,i.kt)("p",null,"In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output."),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"id")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml"},'\n')),(0,i.kt)("p",null,"The Acc and Version fields are merged to form the ID (RCV000000001.2)"),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"significance")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{7}","{7}":!0},'\n \n \n \n \n \n no interpretation for the single variant\n \n \n \n \n \n\n')),(0,i.kt)("p",null,"May have multiple significances listed."),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"reviewStatus")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-xml",metastring:"{4}","{4}":!0},"\n \n \n no interpretation for the single variant\n \n \n\n")),(0,i.kt)("h2",{id:"known-issues"},"Known Issues"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("ul",{parentName:"div"},(0,i.kt)("li",{parentName:"ul"},"The XML file contains ~1k more entries (out of 162K) than the VCF file"),(0,i.kt)("li",{parentName:"ul"},"The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF"),(0,i.kt)("li",{parentName:"ul"},'The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H",\netc.) as their alternate allele')))),(0,i.kt)("h2",{id:"download-urls"},"Download URLs"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz"},"ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz")),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz"},"https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz")),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)(r.default,{mdxType:"JSON"}),(0,i.kt)("h2",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,i.kt)("p",null,"There are 2 ways of building your own OMIM supplementary files using ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils"),"."),(0,i.kt)("p",null,"The first way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"clinvar"),".\nThe ClinVar ",(0,i.kt)("inlineCode",{parentName:"p"},".nsa")," and ",(0,i.kt)("inlineCode",{parentName:"p"},".nsi")," for Illumina Connected Annotations can be built using the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,i.kt)("inlineCode",{parentName:"p"},"clinvar")," subcommand."),(0,i.kt)("p",null,"The second way is to use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),". To use ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),", read more in ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," section."),(0,i.kt)("h3",{id:"using-clinvar-subcommands-and-source-data-files"},"Using ",(0,i.kt)("inlineCode",{parentName:"h3"},"clinvar")," subcommands and source data files"),(0,i.kt)("p",null,"Two input ",(0,i.kt)("inlineCode",{parentName:"p"},".xml")," files and a ",(0,i.kt)("inlineCode",{parentName:"p"},".version")," file are required in order to build the ",(0,i.kt)("inlineCode",{parentName:"p"},".nsa")," and ",(0,i.kt)("inlineCode",{parentName:"p"},".nsi")," file. You should have the following files:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"ClinVarFullRelease_00-latest.xml.gz ClinVarVariationRelease_00-latest.xml.gz\nClinVarFullRelease_00-latest.xml.gz.version\n")),(0,i.kt)("p",null,"The version file is a json file with the following format."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},'{\n "name": "ClinVar",\n "version": "20231230",\n "description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",\n "releaseDate": "2024-01-10"\n}\n')),(0,i.kt)("p",null,"You have to adjust the version and release date according to the actual date of the ClinVar."),(0,i.kt)("p",null,"The help menu for the utility is as follows:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll clinvar\n---------------------------------------------------------------------------\nSAUtils (c) 2022 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll clinvar [options]\nCreates a supplementary database with ClinVar annotations\n\nOPTIONS:\n --ref, -r compressed reference sequence file\n --rcv, -i ClinVar Full release XML file\n --vcv, -c ClinVar Variation release XML file\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\ndotnet SAUtils.dll clinvar\n")),(0,i.kt)("p",null,"Here is a sample execution:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll clinvar \\\\\n--ref ~/development/References/7/Homo_sapiens.GRCh38.Nirvana.dat --rcv ClinVarFullRelease_00-latest.xml.gz \\\\\n--vcv ClinVarVariationRelease_00-latest.xml.gz --out ~/development/SupplementaryDatabase/63/GRCh38\n---------------------------------------------------------------------------\nSAUtils (c) 2022 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1\n---------------------------------------------------------------------------\n\nFound 1535677 VCV records\nUnknown vcv id:225946 found in RCV000211201.2\nUnknown vcv id:225946 found in RCV000211253.2\nUnknown vcv id:225946 found in RCV000211375.2\nUnknown vcv id:976117 found in RCV001253316.1\nUnknown vcv id:1321016 found in RCV001776995.2\n3 unknown VCVs found in RCVs.\n225946,976117,1321016\n0 unknown VCVs found in RCVs.\nChromosome 1 completed in 00:00:15.1\nChromosome 2 completed in 00:00:20.0\nChromosome 3 completed in 00:00:09.7\nChromosome 4 completed in 00:00:05.9\nChromosome 5 completed in 00:00:09.8\nChromosome 6 completed in 00:00:08.3\nChromosome 7 completed in 00:00:08.7\nChromosome 8 completed in 00:00:06.2\nChromosome 9 completed in 00:00:08.6\nChromosome 10 completed in 00:00:07.0\nChromosome 11 completed in 00:00:11.7\nChromosome 12 completed in 00:00:08.0\nChromosome 13 completed in 00:00:06.3\nChromosome 14 completed in 00:00:06.0\nChromosome 15 completed in 00:00:06.6\nChromosome 16 completed in 00:00:10.8\nChromosome 17 completed in 00:00:13.8\nChromosome 18 completed in 00:00:02.9\nChromosome 19 completed in 00:00:08.7\nChromosome 20 completed in 00:00:03.6\nChromosome 21 completed in 00:00:02.4\nChromosome 22 completed in 00:00:03.6\nChromosome MT completed in 00:00:00.2\nChromosome X completed in 00:00:07.5\nChromosome Y completed in 00:00:00.0\nMaximum bp shifted for any variant:2\nWriting 37097 intervals to database...\n\nTime: 00:13:26.9\n\n")))}d.isMDXComponent=!0},4311:(e,n,t)=>{t.d(n,{Z:()=>a});const a=t.p+"assets/files/clinvar-rcv-example-4e0a2f2ac6c70acd0ce41410690b683b.xml"}}]); \ No newline at end of file diff --git a/assets/js/70f7faf9.a369deca.js b/assets/js/70f7faf9.a369deca.js new file mode 100644 index 00000000..7533c710 --- /dev/null +++ b/assets/js/70f7faf9.a369deca.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9861],{3905:(e,n,t)=>{t.d(n,{Zo:()=>u,kt:()=>f});var a=t(7294);function i(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function o(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,a)}return t}function r(e){for(var n=1;n=0||(i[t]=e[t]);return i}(e,n);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(i[t]=e[t])}return i}var l=a.createContext({}),s=function(e){var n=a.useContext(l),t=n;return e&&(t="function"==typeof e?e(n):r(r({},n),e)),t},u=function(e){var n=s(e.components);return a.createElement(l.Provider,{value:n},e.children)},p="mdxType",d={inlineCode:"code",wrapper:function(e){var n=e.children;return a.createElement(a.Fragment,{},n)}},m=a.forwardRef((function(e,n){var t=e.components,i=e.mdxType,o=e.originalType,l=e.parentName,u=c(e,["components","mdxType","originalType","parentName"]),p=s(t),m=i,f=p["".concat(l,".").concat(m)]||p[m]||d[m]||o;return t?a.createElement(f,r(r({ref:n},u),{},{components:t})):a.createElement(f,r({ref:n},u))}));function f(e,n){var t=arguments,i=n&&n.mdxType;if("string"==typeof e||i){var o=t.length,r=new Array(o);r[0]=m;var c={};for(var l in n)hasOwnProperty.call(n,l)&&(c[l]=n[l]);c.originalType=e,c[p]="string"==typeof e?e:i,r[1]=c;for(var s=2;s{t.r(n),t.d(n,{contentTitle:()=>r,default:()=>p,frontMatter:()=>o,metadata:()=>c,toc:()=>l});var a=t(7462),i=(t(7294),t(3905));const o={title:"Junction Preserving Annotation"},r=void 0,c={unversionedId:"core-functionality/junction-preserving",id:"version-3.24/core-functionality/junction-preserving",title:"Junction Preserving Annotation",description:"Background",source:"@site/versioned_docs/version-3.24/core-functionality/junction-preserving.md",sourceDirName:"core-functionality",slug:"/core-functionality/junction-preserving",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/junction-preserving",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/junction-preserving.md",tags:[],version:"3.24",frontMatter:{title:"Junction Preserving Annotation"},sidebar:"docs",previous:{title:"ISCN Notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/iscn-notation"},next:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impacts"}},l=[{value:"Background",id:"background",children:[],level:2},{value:"Implementation",id:"implementation",children:[],level:2}],s={toc:l},u="wrapper";function p(e){let{components:n,...o}=e;return(0,i.kt)(u,(0,a.Z)({},s,o,{components:n,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"background"},"Background"),(0,i.kt)("p",null,"When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant. "),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"Deletion at exon boundary",src:t(4467).Z})),(0,i.kt)("p",null,"Note that: "),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"When right-aligned the variant starts at the first base of the exon (as pictured)."),(0,i.kt)("li",{parentName:"ul"},"When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.")),(0,i.kt)("p",null,"From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an ",(0,i.kt)("inlineCode",{parentName:"p"},"inframe_deletion")," not a ",(0,i.kt)("inlineCode",{parentName:"p"},"splice_acceptor_variant")," as splice acceptor sequence ",(0,i.kt)("inlineCode",{parentName:"p"},"AG")," is unaffected."),(0,i.kt)("p",null,"When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction."),(0,i.kt)("h2",{id:"implementation"},"Implementation"),(0,i.kt)("p",null,"By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported."),(0,i.kt)("div",{className:"admonition admonition-note alert alert--secondary"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"}))),"note")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant."))))}p.isMDXComponent=!0},4467:(e,n,t)=>{t.d(n,{Z:()=>a});const a=t.p+"assets/images/27-nt-deletion-c733fc75acdc0ef64ac7d181d5ff81fe.png"}}]); \ No newline at end of file diff --git a/assets/js/730b3355.53e6d777.js b/assets/js/730b3355.53e6d777.js new file mode 100644 index 00000000..a74a0879 --- /dev/null +++ b/assets/js/730b3355.53e6d777.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6386],{3905:(t,e,a)=>{a.d(e,{Zo:()=>p,kt:()=>g});var n=a(7294);function l(t,e,a){return e in t?Object.defineProperty(t,e,{value:a,enumerable:!0,configurable:!0,writable:!0}):t[e]=a,t}function i(t,e){var a=Object.keys(t);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(t);e&&(n=n.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),a.push.apply(a,n)}return a}function r(t){for(var e=1;e=0||(l[a]=t[a]);return l}(t,e);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(t);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(t,a)&&(l[a]=t[a])}return l}var s=n.createContext({}),m=function(t){var e=n.useContext(s),a=e;return t&&(a="function"==typeof t?t(e):r(r({},e),t)),a},p=function(t){var e=m(t.components);return n.createElement(s.Provider,{value:e},t.children)},d="mdxType",k={inlineCode:"code",wrapper:function(t){var e=t.children;return n.createElement(n.Fragment,{},e)}},c=n.forwardRef((function(t,e){var a=t.components,l=t.mdxType,i=t.originalType,s=t.parentName,p=o(t,["components","mdxType","originalType","parentName"]),d=m(a),c=l,g=d["".concat(s,".").concat(c)]||d[c]||k[c]||i;return a?n.createElement(g,r(r({ref:e},p),{},{components:a})):n.createElement(g,r({ref:e},p))}));function g(t,e){var a=arguments,l=e&&e.mdxType;if("string"==typeof t||l){var i=a.length,r=new Array(i);r[0]=c;var o={};for(var s in e)hasOwnProperty.call(e,s)&&(o[s]=e[s]);o.originalType=t,o[d]="string"==typeof t?t:l,r[1]=o;for(var m=2;m{a.r(e),a.d(e,{contentTitle:()=>r,default:()=>d,frontMatter:()=>i,metadata:()=>o,toc:()=>s});var n=a(7462),l=(a(7294),a(3905));const i={title:"Custom Annotations"},r=void 0,o={unversionedId:"file-formats/custom-annotations",id:"version-3.24/file-formats/custom-annotations",title:"Custom Annotations",description:"Overview",source:"@site/versioned_docs/version-3.24/file-formats/custom-annotations.md",sourceDirName:"file-formats",slug:"/file-formats/custom-annotations",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/custom-annotations",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/file-formats/custom-annotations.md",tags:[],version:"3.24",frontMatter:{title:"Custom Annotations"},sidebar:"docs",previous:{title:"Illumina Connected Annotations VCF File Format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-vcf-file-format"},next:{title:"Canonical Transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/canonical-transcripts"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Variant File Format",id:"variant-file-format",children:[{value:"Basic Allele Frequency Example",id:"basic-allele-frequency-example",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv",children:[],level:4},{value:"Convert to Illumina Connected Annotations Format",id:"convert-to-illumina-connected-annotations-format",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results",children:[],level:4}],level:3},{value:"Categories & Descriptions Example",id:"categories--descriptions-example",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv-1",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations-1",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results-1",children:[],level:4},{value:"Using Positional Matches",id:"using-positional-matches",children:[],level:4}],level:3},{value:"Genomic Region Example",id:"genomic-region-example",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv-2",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations-2",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results-2",children:[],level:4}],level:3},{value:"Genomic Regions for Structural Variants Example",id:"genomic-regions-for-structural-variants-example",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv-3",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations-3",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results-3",children:[],level:4}],level:3},{value:"Mixing Small Variants and Genomic Regions",id:"mixing-small-variants-and-genomic-regions",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv-4",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations-4",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results-4",children:[],level:4}],level:3}],level:2},{value:"Gene File Format",id:"gene-file-format",children:[{value:"Basic Gene Example",id:"basic-gene-example",children:[{value:"Create the Custom Annotation TSV",id:"create-the-custom-annotation-tsv-5",children:[],level:4},{value:"Annotate with Illumina Connected Annotations",id:"annotate-with-illumina-connected-annotations-5",children:[],level:4},{value:"Investigate the Results",id:"investigate-the-results-5",children:[],level:4}],level:3}],level:2},{value:"Customizing the Header",id:"customizing-the-header",children:[{value:"Title",id:"title",children:[],level:3},{value:"Genome Assemblies",id:"genome-assemblies",children:[],level:3},{value:"Matching Criteria",id:"matching-criteria",children:[],level:3},{value:"Categories",id:"categories",children:[],level:3},{value:"Descriptions",id:"descriptions",children:[{value:"Populations",id:"populations",children:[],level:4}],level:3},{value:"Data Types",id:"data-types",children:[],level:3}],level:2},{value:"Using SAUtils",id:"using-sautils",children:[{value:"Convert Variant File",id:"convert-variant-file",children:[],level:3},{value:"Convert Gene File",id:"convert-gene-file",children:[],level:3}],level:2}],m={toc:s},p="wrapper";function d(t){let{components:e,...a}=t;return(0,l.kt)(p,(0,n.Z)({},m,a,{components:e,mdxType:"MDXLayout"}),(0,l.kt)("h2",{id:"overview"},"Overview"),(0,l.kt)("p",null,"While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another\ncommon use case involves protected health information (PHI). Custom annotations are a mechanism that enables both use cases."),(0,l.kt)("p",null,"Here are some examples of how our collaborators use custom annotations:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"associating context from both a sample-level and a sample cohort level with the variant annotations"),(0,l.kt)("li",{parentName:"ul"},"adding content that is licensed (e.g. HGMD) to the variant annotations")),(0,l.kt)("p",null,"At the moment, we have two different custom annotation file formats. One provides additional annotations to variants (both small variants and SVs)\nwhile the other caters to gene annotations."),(0,l.kt)("p",null,"In both cases, the custom annotation file format is a tab-delimited file that is separated into two parts: the header & the data."),(0,l.kt)("p",null,"The header is where you can customize how you want the data to appear in the JSON file and provide context about the genome assembly and how\nIllumina Connected Annotations should match the variants."),(0,l.kt)("p",null,"At Illumina, there are usually many components downstream of Illumina Connected Annotations that have to parse our annotations. If a customer provides a custom\nannotation, those downstream tools need to understand more about the data such as:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"data type (e.g. number, boolean, or a string)"),(0,l.kt)("li",{parentName:"ul"},"data category (e.g. is this an allele count, allele number, allele frequency, etc.)"),(0,l.kt)("li",{parentName:"ul"},"associated population (i.e. if this is an allele frequency)")),(0,l.kt)("p",null,"For each custom annotation, Illumina Connected Annotations uses this context to create a ",(0,l.kt)("a",{parentName:"p",href:"https://json-schema.org/"},"JSON schema")," that can be sent to downstream tools. If\na tool knows that this is an allele frequency, it can validate user input to ensure that it's in the range of ","[0, 1]","."),(0,l.kt)("h2",{id:"variant-file-format"},"Variant File Format"),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"File Format")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Illumina Connected Annotations expects plain text (or gzipped text) files. Using tools like Excel can add extra characters that can break parsing. We highly recommend creating and modifying these files with plain text editor like Notepad, Notepad++ or Atom."))),(0,l.kt)("h3",{id:"basic-allele-frequency-example"},"Basic Allele Frequency Example"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"Imagine that you want to create a basic allele frequency custom annotation for small variants. If we visualized the tab-delimited file\n(TSV), it would look something like this:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 5"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#assembly=GRCh38"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#matchVariantsBy=allele"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#CHROM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"POS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"REF"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALT"),(0,l.kt)("td",{parentName:"tr",align:"left"},"allAf")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"AlleleFrequency")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALL")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"number")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"23603511"),(0,l.kt)("td",{parentName:"tr",align:"left"},"TGA"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.000006579")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"68801894"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.000006569")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr19"),(0,l.kt)("td",{parentName:"tr",align:"left"},"11107436"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.00003291")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource.tsv"},"the full TSV file"),"."),(0,l.kt)("p",null,"Let's go over the header and discuss the contents:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"title")," indicates the name of the JSON key"),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"assembly")," indicates that this data is only valid for ",(0,l.kt)("inlineCode",{parentName:"li"},"GRCh38"),"."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"matchVariantsBy")," indicates how annotations should be matched and reported. In this case annotations will be matched and reported by allele."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"categories")," provides hints to downstream tools on how they might want to treat the data. In this case, we indicate that it's an allele frequency."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"descriptions")," are used in special circumstances to provide more context. Even though column 5 is called ",(0,l.kt)("inlineCode",{parentName:"li"},"allAf"),", it might not be clear to a\ndownstream tool that this means a global allele frequency using all sub-populations. In this case, ",(0,l.kt)("inlineCode",{parentName:"li"},"ALL")," indicates the intended population."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"type")," indicates to downstream tools the data type. Since allele frequencies are numbers, we'll write ",(0,l.kt)("inlineCode",{parentName:"li"},"number")," in this column.")),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Reference Base Checking")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Illumina Connected Annotations validates all the reference bases in a custom annotation. If a variant or genomic region is specified that has the wrong reference base, an error will be produced."))),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Sorting")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"The variants within each chromosome must be sorted by genomic position."))),(0,l.kt)("h4",{id:"convert-to-illumina-connected-annotations-format"},"Convert to Illumina Connected Annotations Format"),(0,l.kt)("p",null,"First we need to convert the TSV file to Illumina Connected Annotations's native file format and let's put that file in a new directory called CA:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-bash"},"$ mkdir CA\n$ dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \\\n -r Data/References/Homo_sapiens.GRCh38.Nirvana.dat -i MyDataSource.tsv -o CA\n---------------------------------------------------------------------------\nSAUtils (c) 2020 Illumina, Inc.\nStromberg, Roy, Lajugie, Jiang, Li, and Kang 3.12.0\n---------------------------------------------------------------------------\n\nChromosome 16 completed in 00:00:00.1\nChromosome 19 completed in 00:00:00.0\n\nTime: 00:00:00.2\n")),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's annotate the following VCF (notice that it's one of the variants that we have in our custom annotation):"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\n16 68801894 . G A . . .\n")),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA.vcf"},"the full VCF file"),"."),(0,l.kt)("p",null,"Since Illumina Connected Annotations can handle multiple directories with external annotations, all we need to do is specify our new CA directory in addition to\nthe normal Illumina Connected Annotations command-line."),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-bash",metastring:"{3}","{3}":!0},"$ dotnet Annotator.dll -c Data/Cache/GRCh38/Both \\\n -r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \\\n --sd Data/SupplementaryAnnotation/GRCh38 --sd CA -i TestCA.vcf -o TestCA\n---------------------------------------------------------------------------\nIlluminaConnectedAnnotations (c) 2020 Illumina, Inc.\nStromberg, Roy, Lajugie, Jiang, Li, and Kang 3.12.0\n---------------------------------------------------------------------------\n\nInitialization Time Positions/s\n---------------------------------------------------------------------------\nCache 00:00:01.8\nSA Position Scan 00:00:00.0 19\n\nReference Preload Annotation Variants/s\n---------------------------------------------------------------------------\nchr16 00:00:00.2 00:00:01.3 1\n\nSummary Time Percent\n---------------------------------------------------------------------------\nInitialization 00:00:01.9 25.5 %\nPreload 00:00:00.2 3.3 %\nAnnotation 00:00:01.3 18.2 %\n\nTime: 00:00:06.3\n")),(0,l.kt)("h4",{id:"investigate-the-results"},"Investigate the Results"),(0,l.kt)("p",null,"We would expect the following data to show up in our JSON output file:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{12-16}","{12-16}":!0},' "variants": [\n {\n "vid": "16-68801894-G-A",\n "chromosome": "16",\n "begin": 68801894,\n "end": 68801894,\n "refAllele": "G",\n "altAllele": "A",\n "variantType": "SNV",\n "hgvsg": "NC_000016.10:g.68801894G>A",\n "phylopScore": 1,\n "MyDataSource": {\n "refAllele": "G",\n "altAllele": "A",\n "allAf": 7e-06\n },\n "clinvar": [\n')),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA.json.gz"},"the full JSON file"),"."),(0,l.kt)("p",null,"Illumina Connected Annotations preserves up to 6 decimal places for allele frequency data."),(0,l.kt)("h3",{id:"categories--descriptions-example"},"Categories & Descriptions Example"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv-1"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"Building on the previous example, we can add other types of annotations like predictions and general notes."),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 5"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 6"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 7"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#assembly=GRCh38"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#matchVariantsBy=allele"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#CHROM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"POS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"REF"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALT"),(0,l.kt)("td",{parentName:"tr",align:"left"},"allAf"),(0,l.kt)("td",{parentName:"tr",align:"left"},"pathogenicity"),(0,l.kt)("td",{parentName:"tr",align:"left"},"notes")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"AlleleFrequency"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Prediction"),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"number"),(0,l.kt)("td",{parentName:"tr",align:"left"},"string"),(0,l.kt)("td",{parentName:"tr",align:"left"},"string")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"23603511"),(0,l.kt)("td",{parentName:"tr",align:"left"},"TGA"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.000006579"),(0,l.kt)("td",{parentName:"tr",align:"left"},"P"),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"68801894"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.000006569"),(0,l.kt)("td",{parentName:"tr",align:"left"},"LP"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Seen in case 123")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr19"),(0,l.kt)("td",{parentName:"tr",align:"left"},"11107436"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"0.00003291"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource2.tsv"},"the full TSV file"),"."),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"Placeholders")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"You can use a period to denote an empty value (much in the same way as periods are used in VCF files to signify missing values). While\nIllumina Connected Annotations also accepts empty columns in the TSV file, we use them in these examples to promote readability."))),(0,l.kt)("p",null,"Let's go over what's new in this example:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 6")," adds a field called ",(0,l.kt)("inlineCode",{parentName:"li"},"pathogenicity")," which uses the ",(0,l.kt)("inlineCode",{parentName:"li"},"Prediction")," category. When using this category, Illumina Connected Annotations will\nvalidate to make\nsure that the field contains either the abbreviations (B, LB, VUS, LP, and P) or the long-form equivalents (e.g. benign or pathogenic)."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 7")," adds a field called ",(0,l.kt)("inlineCode",{parentName:"li"},"notes")," and it doesn't have a category or description. We're just going to use it to add some internal\nnotes.")),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations-1"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's use a new VCF file. It includes all the same positions as our custom annotation file, but only the middle variant also matches the\nalternate allele (allele-specific match):"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\n16 23603511 . TG T . . .\n16 68801894 . G A . . .\n19 11107436 . G C . . .\n")),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA2.vcf"},"the full VCF file"),"."),(0,l.kt)("h4",{id:"investigate-the-results-1"},"Investigate the Results"),(0,l.kt)("p",null,"Because we specified ",(0,l.kt)("inlineCode",{parentName:"p"},"#matchVariantsBy=allele")," in our custom annotation file, only the middle variant will get an annotation:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{12-18}","{12-18}":!0},' "variants": [\n {\n "vid": "16-68801894-G-A",\n "chromosome": "16",\n "begin": 68801894,\n "end": 68801894,\n "refAllele": "G",\n "altAllele": "A",\n "variantType": "SNV",\n "hgvsg": "NC_000016.10:g.68801894G>A",\n "phylopScore": 1,\n "MyDataSource": {\n "refAllele": "G",\n "altAllele": "A",\n "allAf": 7e-06,\n "pathogenicity": "LP",\n "notes": "Seen in case 123"\n },\n "clinvar": [\n')),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA2.json.gz"},"the full JSON file"),"."),(0,l.kt)("h4",{id:"using-positional-matches"},"Using Positional Matches"),(0,l.kt)("p",null,"What would happen if we changed to ",(0,l.kt)("inlineCode",{parentName:"p"},"#matchVariantsBy=position"),"? Two things will happen. First, our positional variants will now match:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{12-17}","{12-17}":!0},' "variants": [\n {\n "vid": "16-23603511-TG-T",\n "chromosome": "16",\n "begin": 23603512,\n "end": 23603512,\n "refAllele": "G",\n "altAllele": "-",\n "variantType": "deletion",\n "hgvsg": "NC_000016.10:g.23603512delG",\n "MyDataSource": [\n {\n "refAllele": "GA",\n "altAllele": "-",\n "allAf": 7e-06,\n "pathogenicity": "P"\n }\n ],\n "clinvar": [\n')),(0,l.kt)("p",null,"In addition, you will now see an extra flag for our allele-specific variant:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{12-20}","{12-20}":!0},' "variants": [\n {\n "vid": "16-68801894-G-A",\n "chromosome": "16",\n "begin": 68801894,\n "end": 68801894,\n "refAllele": "G",\n "altAllele": "A",\n "variantType": "SNV",\n "hgvsg": "NC_000016.10:g.68801894G>A",\n "phylopScore": 1,\n "MyDataSource": [\n {\n "refAllele": "G",\n "altAllele": "A",\n "allAf": 7e-06,\n "pathogenicity": "LP",\n "notes": "Seen in case 123",\n "isAlleleSpecific": true\n }\n ],\n "clinvar": [\n')),(0,l.kt)("h3",{id:"genomic-region-example"},"Genomic Region Example"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv-2"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"In the previous example, we added a note for the middle variant, but sometimes it's handy to annotate a genomic region. Consider the following example:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 5"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#assembly=GRCh38"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#matchVariantsBy=allele"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#CHROM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"POS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"REF"),(0,l.kt)("td",{parentName:"tr",align:"left"},"END"),(0,l.kt)("td",{parentName:"tr",align:"left"},"notes")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"string")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"20000000"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"70000000"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Lots of false positives in this region")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource3.tsv"},"the full TSV file"),"."),(0,l.kt)("p",null,"Let's go over what's new in this example:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 5")," now has a field called ",(0,l.kt)("inlineCode",{parentName:"li"},"notes"),". In essence, it looks exactly like column 7 from our previous example."),(0,l.kt)("li",{parentName:"ul"},"The main difference is that now one of our custom annotation entries is actually a genomic region. Any variant that overlaps with that region will get a custom annotation.")),(0,l.kt)("p",null,"In the previous example we learned about positional matching vs allele-specific matching. For genomic regions, ",(0,l.kt)("inlineCode",{parentName:"p"},"#matchVariantsBy=allele")," and ",(0,l.kt)("inlineCode",{parentName:"p"},"#matchVariantsBy=position")," produce\nthe same result."),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations-2"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's use the same VCF file as our previous example."),(0,l.kt)("h4",{id:"investigate-the-results-2"},"Investigate the Results"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{9-17}","{9-17}":!0},' {\n "chromosome": "16",\n "position": 23603511,\n "refAllele": "TG",\n "altAlleles": [\n "T"\n ],\n "cytogeneticBand": "16p12.2",\n "MyDataSource": [\n {\n "start": 20000000,\n "end": 70000000,\n "notes": "Lots of false positives in this region",\n "reciprocalOverlap": 0,\n "annotationOverlap": 0\n }\n ],\n "variants": [\n')),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA3.json.gz"},"the full JSON file"),"."),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"Reciprocal & Annotation Overlap")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"For all intervals, Illumina Connected Annotations internally calculates two overlaps: a ",(0,l.kt)("strong",{parentName:"p"},"variant overlap")," and an ",(0,l.kt)("strong",{parentName:"p"},"annotation overlap"),". Variant overlap is the percentage of the variant's length that is\noverlapped. Annotation overlap is the percentage of the annotation's length that is overlap."),(0,l.kt)("p",{parentName:"div"},(0,l.kt)("strong",{parentName:"p"},"Reciprocal overlap")," is the minimum of those two overlaps. Given that the annotation is 50 Mbp and the deletion is one 1 bp, both overlaps will be pretty close to 0."))),(0,l.kt)("p",null,"We will also see this annotation for the other variant on chr16:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{9-17}","{9-17}":!0},' {\n "chromosome": "16",\n "position": 68801894,\n "refAllele": "G",\n "altAlleles": [\n "A"\n ],\n "cytogeneticBand": "16q22.1",\n "MyDataSource": [\n {\n "start": 20000000,\n "end": 70000000,\n "notes": "Lots of false positives in this region",\n "reciprocalOverlap": 0,\n "annotationOverlap": 0\n }\n ],\n "variants": [\n')),(0,l.kt)("h3",{id:"genomic-regions-for-structural-variants-example"},"Genomic Regions for Structural Variants Example"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv-3"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"Often we use genomic regions to represent other known CNVs and SVs in the genome. In this use case, we usually don't want to match these regions to other small variants. To force Illumina Connected Annotations to match regions only to other SVs, use the ",(0,l.kt)("inlineCode",{parentName:"p"},"#matchVariantsBy=sv")," option in the header. Here is an example:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 5"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#assembly=GRCh38"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#matchVariantsBy=sv"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#CHROM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"POS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"REF"),(0,l.kt)("td",{parentName:"tr",align:"left"},"END"),(0,l.kt)("td",{parentName:"tr",align:"left"},"notes")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"string")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"20000000"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"70000000"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Lots of false positives in this region")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource6.tsv"},"the full TSV file"),"."),(0,l.kt)("p",null,"Let's go over what's new in this example:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"The main difference is the header field ",(0,l.kt)("inlineCode",{parentName:"li"},"#matchVariantsBy=sv")," which indicates that only structural variants that overlap these genomic regions will receive annotations.")),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations-3"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's use a new VCF file. It contains the first variant from the previous file and a structural variant deletion- both of which overlap the given genomic region."),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\n16 23603511 . TG T . . .\n16 68801894 . G . . END=73683789;SVTYPE=DEL\n")),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA6.vcf"},"the full VCF file"),"."),(0,l.kt)("h4",{id:"investigate-the-results-3"},"Investigate the Results"),(0,l.kt)("p",null,"Note that this time, ",(0,l.kt)("inlineCode",{parentName:"p"},"MyDataSource")," only showed up for the ",(0,l.kt)("inlineCode",{parentName:"p"},"")," and not the deletion ",(0,l.kt)("inlineCode",{parentName:"p"},"16-23603511-TG-T"),"."),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{21-29}","{21-29}":!0},' {\n "chromosome": "16",\n "position": 23603511,\n "refAllele": "TG",\n "altAlleles": [\n "T"\n ],\n "cytogeneticBand": "16p12.2",\n "variants": [\n ...\n ...\n {\n "chromosome": "16",\n "position": 68801894,\n "svEnd": 73683789,\n "refAllele": "G",\n "altAlleles": [\n ""\n ],\n "cytogeneticBand": "16q22.1-q22.3",\n "MyDataSource": [\n {\n "start": 20000000,\n "end": 70000000,\n "notes": "Lots of false positives in this region",\n "reciprocalOverlap": 0.02396,\n "annotationOverlap": 0.02396\n }\n ],\n "variants": [\n\n')),(0,l.kt)("h3",{id:"mixing-small-variants-and-genomic-regions"},"Mixing Small Variants and Genomic Regions"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv-4"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"Previously we looked at examples that either had small variants or genomic regions. Let's create a file that contains both:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 5"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 6"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#assembly=GRCh38"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#matchVariantsBy=allele"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#CHROM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"POS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"REF"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALT"),(0,l.kt)("td",{parentName:"tr",align:"left"},"END"),(0,l.kt)("td",{parentName:"tr",align:"left"},"notes")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"string")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"23603511"),(0,l.kt)("td",{parentName:"tr",align:"left"},"TGA"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr16"),(0,l.kt)("td",{parentName:"tr",align:"left"},"68801894"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr19"),(0,l.kt)("td",{parentName:"tr",align:"left"},"11107436"),(0,l.kt)("td",{parentName:"tr",align:"left"},"G"),(0,l.kt)("td",{parentName:"tr",align:"left"},"A"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr21"),(0,l.kt)("td",{parentName:"tr",align:"left"},"10510818"),(0,l.kt)("td",{parentName:"tr",align:"left"},"C"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"10699435"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Interval #1")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr21"),(0,l.kt)("td",{parentName:"tr",align:"left"},"10510818"),(0,l.kt)("td",{parentName:"tr",align:"left"},"C"),(0,l.kt)("td",{parentName:"tr",align:"left"},"<","DEL",">"),(0,l.kt)("td",{parentName:"tr",align:"left"},"10699435"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Interval #2")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"chr22"),(0,l.kt)("td",{parentName:"tr",align:"left"},"12370388"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T"),(0,l.kt)("td",{parentName:"tr",align:"left"},"T[chr22:12370729["),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"Known false-positive")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource4.tsv"},"the full TSV file"),"."),(0,l.kt)("p",null,"Let's go over what's new in this example:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 4")," now has the ",(0,l.kt)("inlineCode",{parentName:"li"},"REF")," field. Exception for the case listed below, this is only used by small variants or translocation breakends."),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 5")," now has the ",(0,l.kt)("inlineCode",{parentName:"li"},"END")," field. This is only used by genomic regions."),(0,l.kt)("li",{parentName:"ul"},"There are two custom annotations on chr21 and the start and end coordinates look the same, so what's different? Interval #2 has ",(0,l.kt)("strong",{parentName:"li"},"a symbolic allele in the ALT column"),". When this is used in custom annotation, the start position is treated as the padding base (using VCF conventions). When Illumina Connected Annotations matches a variant to interval #2, it will ignore the padding base and consider the start position to be at position 10510819.")),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations-4"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's use a new VCF file to study how matching works for intervals #1 and #2:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\n21 10510818 . C . . END=10699435;SVTYPE=DUP\n22 12370388 . T T[chr22:12370729[ . . SVTYPE=BND\n")),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA3.vcf"},"the full VCF file"),"."),(0,l.kt)("p",null,'The first variant is similar to the custom annotation labelled "interval #2". Position 10510818 is the padding base, so it effectively starts at position 10510819.'),(0,l.kt)("h4",{id:"investigate-the-results-4"},"Investigate the Results"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{11-26}","{11-26}":!0},' "positions": [\n {\n "chromosome": "21",\n "position": 10510818,\n "svEnd": 10699435,\n "refAllele": "C",\n "altAlleles": [\n ""\n ],\n "cytogeneticBand": "21p11.2",\n "MyDataSource": [\n {\n "start": 10510818,\n "end": 10699435,\n "notes": "Interval #1",\n "reciprocalOverlap": 0.99999,\n "annotationOverlap": 0.99999\n },\n {\n "start": 10510819,\n "end": 10699435,\n "notes": "Interval #2",\n "reciprocalOverlap": 1,\n "annotationOverlap": 1\n }\n ],\n')),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA4.json.gz"},"the full JSON file"),"."),(0,l.kt)("p",null,"As expected, the variant and interval #2 have matching endpoints, therefore there is 100% overlap. Interval #1 technically starts 1 bp earlier, so its overlap 99.9%."),(0,l.kt)("p",null,"Further down the JSON file, we find the annotated translocation breakend:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{11-15}","{11-15}":!0},' "variants": [\n {\n "vid": "22-12370388-T-T[chr22:12370729[",\n "chromosome": "22",\n "begin": 12370388,\n "end": 12370388,\n "isStructuralVariant": true,\n "refAllele": "T",\n "altAllele": "T[chr22:12370729[",\n "variantType": "translocation_breakend",\n "MyDataSource": {\n "refAllele": "T",\n "altAllele": "T[chr22:12370729[",\n "notes": "Known false-positive"\n }\n }\n')),(0,l.kt)("h2",{id:"gene-file-format"},"Gene File Format"),(0,l.kt)("h3",{id:"basic-gene-example"},"Basic Gene Example"),(0,l.kt)("h4",{id:"create-the-custom-annotation-tsv-5"},"Create the Custom Annotation TSV"),(0,l.kt)("p",null,"Previously we looked at examples that either had small variants or genomic regions, however, sometimes we would like to add custom gene annotations. The gene custom annotation file format\nlooks slightly different:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 1"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 2"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 3"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Col 4"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#title=MyDataSource"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"})),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#geneSymbol"),(0,l.kt)("td",{parentName:"tr",align:"left"},"geneId"),(0,l.kt)("td",{parentName:"tr",align:"left"},"phenotype"),(0,l.kt)("td",{parentName:"tr",align:"left"},"notes")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#categories"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#descriptions"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"#type"),(0,l.kt)("td",{parentName:"tr",align:"left"},"."),(0,l.kt)("td",{parentName:"tr",align:"left"},"string"),(0,l.kt)("td",{parentName:"tr",align:"left"},"string")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"TP53"),(0,l.kt)("td",{parentName:"tr",align:"left"},"7157"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Colorectal cancer, hereditary nonpolyposis, type 5"),(0,l.kt)("td",{parentName:"tr",align:"left"},".")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"KRAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ENSG00000133703"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Mismatch repair cancer syndrome"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Seen in cohort 123")))),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/MyDataSource5.tsv"},"the full TSV file"),"."),(0,l.kt)("p",null,"Let's go over what's in this example:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("strong",{parentName:"li"},"Column 2")," has the ",(0,l.kt)("inlineCode",{parentName:"li"},"geneId")," field. This can be either an ",(0,l.kt)("strong",{parentName:"li"},"Entrez Gene ID")," or an ",(0,l.kt)("strong",{parentName:"li"},"Ensembl ID"),".")),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Gene Symbols")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Gene symbols are always in flux and are being updated on a daily basis at the NCBI and at HGNC. Due to this, Illumina Connected Annotations uses the ",(0,l.kt)("inlineCode",{parentName:"p"},"geneId")," to match genes rather than the gene symbol. However, to\nmake the custom annotation files easier to read, we've included the ",(0,l.kt)("inlineCode",{parentName:"p"},"geneSymbol")," column as well."))),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Unknown Gene IDs")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"When Illumina Connected Annotations parses the gene custom annotation file, it will note any gene IDs that are currently not recognized in Illumina Connected Annotations. In such a case, Illumina Connected Annotations will display an error showing all the\nunrecognized gene IDs."))),(0,l.kt)("h4",{id:"annotate-with-illumina-connected-annotations-5"},"Annotate with Illumina Connected Annotations"),(0,l.kt)("p",null,"Let's use a VCF file that contain variants in TP53 and KRAS:"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\n12 25227255 . A T . . .\n17 7675074 . C A . . .\n")),(0,l.kt)("p",null,"Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA4.vcf"},"the full VCF file"),"."),(0,l.kt)("h4",{id:"investigate-the-results-5"},"Investigate the Results"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json",metastring:"{24-27}","{24-27}":!0},' "genes": [\n {\n "name": "KRAS",\n "clingenGeneValidity": [\n {\n "diseaseId": "MONDO_0009026",\n "disease": "Costello syndrome",\n "classification": "disputed",\n "classificationDate": "2018-07-24"\n }\n ],\n "clingenDosageSensitivityMap": {\n "haploinsufficiency": "no evidence to suggest that dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "no evidence to suggest that dosage sensitivity is associated with clinical phenotype"\n },\n "gnomAD": {\n "pLi": 0.000788,\n "pRec": 0.789,\n "pNull": 0.21,\n "synZ": 0.336,\n "misZ": 2.32,\n "loeuf": 1.24\n },\n "MyDataSource": {\n "phenotype": "Mismatch repair cancer syndrome",\n "notes": "Seen in cohort 123"\n }\n },\n')),(0,l.kt)("p",null,"This is the abbreviated output for KRAS. Here's ",(0,l.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/TestCA5.json.gz"},"the full JSON file")," if you want to see the complete KRAS entry."),(0,l.kt)("h2",{id:"customizing-the-header"},"Customizing the Header"),(0,l.kt)("h3",{id:"title"},"Title"),(0,l.kt)("p",null,"For the title, you can provide any string that hasn't already been used. The title should be unique."),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"caution")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Make sure that the title does not conflict with other keys in the JSON file."))),(0,l.kt)("p",null,"For small variants, you can't provide a title that conflicts with other keys in the variant object. Some examples of this would be\n",(0,l.kt)("inlineCode",{parentName:"p"},"vid"),", ",(0,l.kt)("inlineCode",{parentName:"p"},"chromosome"),", ",(0,l.kt)("inlineCode",{parentName:"p"},"transcripts"),", etc.. The title should also not conflict with other data source keys like ",(0,l.kt)("inlineCode",{parentName:"p"},"clinvar")," or ",(0,l.kt)("inlineCode",{parentName:"p"},"gnomad"),"."),(0,l.kt)("p",null,"For structural variants, you can't provide a title that conflicts with other keys in the position object. Some examples of this would be\n",(0,l.kt)("inlineCode",{parentName:"p"},"chromosome"),", ",(0,l.kt)("inlineCode",{parentName:"p"},"svLength"),", ",(0,l.kt)("inlineCode",{parentName:"p"},"cytogeneticBand"),", etc. The title should also not conflict with other data source keys like ",(0,l.kt)("inlineCode",{parentName:"p"},"clingen")," or ",(0,l.kt)("inlineCode",{parentName:"p"},"dgv"),"."),(0,l.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"caution")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"Care should be taken not to annotate using multiple custom annotations that all use the same title."))),(0,l.kt)("h3",{id:"genome-assemblies"},"Genome Assemblies"),(0,l.kt)("p",null,"The following genome assemblies can be specified:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"GRCh37"),(0,l.kt)("li",{parentName:"ul"},"GRCh38")),(0,l.kt)("h3",{id:"matching-criteria"},"Matching Criteria"),(0,l.kt)("p",null,"The matching criteria instructs how Illumina Connected Annotations should match a VCF variant to the custom annotation."),(0,l.kt)("p",null,"The following matching criteria can be specified:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"allele")," - use this when you only want allele-specific matches. This is commonly the case when using allele frequency data sources like ",(0,l.kt)("inlineCode",{parentName:"li"},"gnomAD")),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"position")," - use this when you want positional matches. This is commonly used with disease phenotype data sources like ",(0,l.kt)("inlineCode",{parentName:"li"},"ClinVar")),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"sv")," - use this when you want to match to all other overlapping SVs. This use case arose when we were adding custom annotations for baseline\ncopy number intervals along the genome.")),(0,l.kt)("h3",{id:"categories"},"Categories"),(0,l.kt)("p",null,"Categories are not used by Illumina Connected Annotations, but are often used by downstream tools. Categories provide hints for how those tools should filter or display\nthe annotation data."),(0,l.kt)("p",null,"When a category is specified, Illumina Connected Annotations will provide additional validation for those fields. The following table describes each category:"),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Category"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Description"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Validation"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"AlleleCount"),(0,l.kt)("td",{parentName:"tr",align:"left"},"allele counts for a specific population"),(0,l.kt)("td",{parentName:"tr",align:"left"},"See the supported populations below")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"AlleleNumber"),(0,l.kt)("td",{parentName:"tr",align:"left"},"allele numbers for a specific population"),(0,l.kt)("td",{parentName:"tr",align:"left"},"See the supported populations below")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"AlleleFrequency"),(0,l.kt)("td",{parentName:"tr",align:"left"},"allele frequencies for a specific population"),(0,l.kt)("td",{parentName:"tr",align:"left"},"See the supported populations below")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"Prediction"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ACMG-style pathogenicity classifications"),(0,l.kt)("td",{parentName:"tr",align:"left"},"\u2022 ",(0,l.kt)("inlineCode",{parentName:"td"},"benign")," (B)",(0,l.kt)("br",null),"\u2022 ",(0,l.kt)("inlineCode",{parentName:"td"},"likely benign")," (LB)",(0,l.kt)("br",null),"\u2022 ",(0,l.kt)("inlineCode",{parentName:"td"},"VUS"),(0,l.kt)("br",null),"\u2022 ",(0,l.kt)("inlineCode",{parentName:"td"},"likely pathogenic")," (LP)",(0,l.kt)("br",null),"\u2022 ",(0,l.kt)("inlineCode",{parentName:"td"},"pathogenic")," (P)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"Filter"),(0,l.kt)("td",{parentName:"tr",align:"left"},"free text that signals downstream tools to add the column to the filter"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Max 20 characters")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"Description"),(0,l.kt)("td",{parentName:"tr",align:"left"},"free-text description"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Max 100 characters")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"Identifier"),(0,l.kt)("td",{parentName:"tr",align:"left"},"any ID"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Max 50 characters")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"HomozygousCount"),(0,l.kt)("td",{parentName:"tr",align:"left"},"count of homozygous individuals for a specific population"),(0,l.kt)("td",{parentName:"tr",align:"left"},"See the supported populations below")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"Score"),(0,l.kt)("td",{parentName:"tr",align:"left"},"any score value"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Any double-precision floating point number")))),(0,l.kt)("h3",{id:"descriptions"},"Descriptions"),(0,l.kt)("p",null,"Descriptions are used to add more context to the categories. For now, descriptions are mainly used to associate allele counts, numbers, and frequencies with their respective populations."),(0,l.kt)("h4",{id:"populations"},"Populations"),(0,l.kt)("p",null,"The following populations were specified in the HapMap project, 1000 Genomes Project, ExAC, and gnomAD."),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:"left"},"Population Code"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Super-population Code"),(0,l.kt)("th",{parentName:"tr",align:"left"},"Description"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ACB"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"African Caribbeans in Barbados")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"African")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ALL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"ALL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"All populations")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Ad Mixed American")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ASJ"),(0,l.kt)("td",{parentName:"tr",align:"left"}),(0,l.kt)("td",{parentName:"tr",align:"left"},"Ashkenazi Jewish")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ASW"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Americans of African Ancestry in SW USA")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"BEB"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Bengali from Bangladesh")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"CDX"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Chinese Dai in Xishuangbanna, China")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"CEU"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Utah Residents (CEPH) with Northern and Western European Ancestry")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"CHB"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Han Chinese in Beijing, China")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"CHS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Southern Han Chinese")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"CLM"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Colombians from Medellin, Colombia")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"East Asian")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ESN"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Esan in Nigeria")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"European")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"FIN"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Finnish in Finland")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"GBR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"British in England and Scotland")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"GIH"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Gujarati Indian from Houston, Texas")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"GWD"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Gambian in Western Divisions in the Gambia")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"IBS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Iberian population in Spain")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"ITU"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Indian Telugu from the UK")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"JPT"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Japanese in Tokyo, Japan")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"KHV"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Kinh in Ho Chi Minh City, Vietnam")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"LWK"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Luhya in Webuye, Kenya")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"MAG"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Mandinka in the Gambia")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"MKK"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Maasai in Kinyawa, Kenya")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"MSL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Mende in Sierra Leone")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"MXL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Mexican Ancestry from Los Angeles, USA")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"NFE"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"European (Non-Finnish)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"OTH"),(0,l.kt)("td",{parentName:"tr",align:"left"},"OTH"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Other")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"PEL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Peruvians from Lima, Peru")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"PJL"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Punjabi from Lahore, Pakistan")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"PUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AMR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Puerto Ricans from Puerto Rico")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"South Asian")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"STU"),(0,l.kt)("td",{parentName:"tr",align:"left"},"SAS"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Sri Lankan Tamil from the UK")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"TSI"),(0,l.kt)("td",{parentName:"tr",align:"left"},"EUR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Toscani in Italia")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:"left"},"YRI"),(0,l.kt)("td",{parentName:"tr",align:"left"},"AFR"),(0,l.kt)("td",{parentName:"tr",align:"left"},"Yoruba in Ibadan, Nigeria")))),(0,l.kt)("h3",{id:"data-types"},"Data Types"),(0,l.kt)("p",null,"Each custom annotation can be one of the following data types:"),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"bool")," - true or false"),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"number")," - any integer or floating-point number"),(0,l.kt)("li",{parentName:"ul"},(0,l.kt)("inlineCode",{parentName:"li"},"string")," - text")),(0,l.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,l.kt)("div",{parentName:"div",className:"admonition-heading"},(0,l.kt)("h5",{parentName:"div"},(0,l.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,l.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,l.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,l.kt)("div",{parentName:"div",className:"admonition-content"},(0,l.kt)("p",{parentName:"div"},"For boolean variables, only keys with a ",(0,l.kt)("inlineCode",{parentName:"p"},"true")," value will be output to the JSON object."))),(0,l.kt)("h2",{id:"using-sautils"},"Using SAUtils"),(0,l.kt)("p",null,"Illumina Connected Annotations includes a tool called ",(0,l.kt)("inlineCode",{parentName:"p"},"SAUtils")," that converts various data sources into Illumina Connected Annotations's native binary format. The sub-commands ",(0,l.kt)("inlineCode",{parentName:"p"},"customvar")," and ",(0,l.kt)("inlineCode",{parentName:"p"},"customgene")," are used to specify a variant file or a gene file respectively."),(0,l.kt)("h3",{id:"convert-variant-file"},"Convert Variant File"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \\\n -r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \\\n -i MyDataSource.tsv \\\n -o SupplementaryAnnotation\n")),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-r")," argument specifies the compressed reference path"),(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-i")," argument specifies the input TSV path"),(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-o")," argument specifies the output directory")),(0,l.kt)("h3",{id:"convert-gene-file"},"Convert Gene File"),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet bin/Release/netcoreapp2.1/SAUtils.dll customgene \\\n -r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \\\n -c Data/Cache \\\n -i MyDataSource.tsv \\\n -o SupplementaryAnnotation\n")),(0,l.kt)("ul",null,(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-c")," argument specifies the Illumina Connected Annotations cache path"),(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-i")," argument specifies the input TSV path"),(0,l.kt)("li",{parentName:"ul"},"the ",(0,l.kt)("inlineCode",{parentName:"li"},"-o")," argument specifies the output directory")))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/74830f3d.f9a8ae91.js b/assets/js/74830f3d.f9a8ae91.js new file mode 100644 index 00000000..b05b69e5 --- /dev/null +++ b/assets/js/74830f3d.f9a8ae91.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7520],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>f});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function o(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var s=a.createContext({}),p=function(t){var e=a.useContext(s),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},m=function(t){var e=p(t.components);return a.createElement(s.Provider,{value:e},t.children)},c="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},u=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,s=t.parentName,m=i(t,["components","mdxType","originalType","parentName"]),c=p(n),u=r,f=c["".concat(s,".").concat(u)]||c[u]||d[u]||l;return n?a.createElement(f,o(o({ref:e},m),{},{components:n})):a.createElement(f,o({ref:e},m))}));function f(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,o=new Array(l);o[0]=u;var i={};for(var s in e)hasOwnProperty.call(e,s)&&(i[s]=e[s]);i.originalType=t,i[c]="string"==typeof t?t:r,o[1]=i;for(var p=2;p{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>c,frontMatter:()=>l,metadata:()=>i,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/mitomap-small-variants-json",id:"version-3.24/data-sources/mitomap-small-variants-json",title:"mitomap-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},m="wrapper";function c(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "refAllele":"G",\n "altAllele":"A",\n "diseases":[ \n "Bipolar disorder",\n "Melanoma"\n ],\n "hasHomoplasmy":false,\n "hasHeteroplasmy":true,\n "status":"Reported",\n "clinicalSignificance":"confirmed pathogenic",\n "scorePercentile":83.30,\n "numGenBankFullLengthSeqs":2,\n "pubMedIds":["2316527","6299878","6301949"],\n "isAlleleSpecific":true\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"diseases"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"associated diseases")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hasHomoplasmy"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hasHeteroplasmy"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"status"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"record status")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"clinicalSignificance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"predicted pathogenicity")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"MitoTIP score")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numGenBankFullLengthSeqs"),(0,r.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,r.kt)("td",{parentName:"tr",align:"left"},"# of GenBank full-length sequences")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,r.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the MITOMAP alternate allele")))))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/75a3a2eb.00f2bb5f.js b/assets/js/75a3a2eb.00f2bb5f.js deleted file mode 100644 index a74eecea..00000000 --- a/assets/js/75a3a2eb.00f2bb5f.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9767],{3905:(t,n,e)=>{e.d(n,{Zo:()=>d,kt:()=>N});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var m=a.createContext({}),o=function(t){var n=a.useContext(m),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},d=function(t){var n=o(t.components);return a.createElement(m.Provider,{value:n},t.children)},u="mdxType",g={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},k=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,m=t.parentName,d=p(t,["components","mdxType","originalType","parentName"]),u=o(e),k=r,N=u["".concat(m,".").concat(k)]||u[k]||g[k]||l;return e?a.createElement(N,i(i({ref:n},d),{},{components:e})):a.createElement(N,i({ref:n},d))}));function N(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=k;var p={};for(var m in n)hasOwnProperty.call(n,m)&&(p[m]=n[m]);p.originalType=t,p[u]="string"==typeof t?t:r,i[1]=p;for(var o=2;o{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>p,toc:()=>m});var a=e(7462),r=(e(7294),e(3905));const l={title:"Transcript Consequence Impact"},i=void 0,p={unversionedId:"core-functionality/transcript-consequence-impacts",id:"core-functionality/transcript-consequence-impacts",title:"Transcript Consequence Impact",description:"Overview",source:"@site/docs/core-functionality/transcript-consequence-impacts.md",sourceDirName:"core-functionality",slug:"/core-functionality/transcript-consequence-impacts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/transcript-consequence-impacts.md",tags:[],version:"current",frontMatter:{title:"Transcript Consequence Impact"},sidebar:"docs",previous:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving"},next:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions"}},m=[{value:"Overview",id:"overview",children:[],level:2},{value:"Sources",id:"sources",children:[],level:2},{value:"Consequence Impacts",id:"consequence-impacts",children:[{value:"Known Issues",id:"known-issues",children:[],level:3}],level:2},{value:"Example Transcript",id:"example-transcript",children:[],level:2}],o={toc:m},d="wrapper";function u(t){let{components:n,...e}=t;return(0,r.kt)(d,(0,a.Z)({},o,e,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Illumina Connected Annotations provides transcript consequence impacts from ",(0,r.kt)("a",{parentName:"p",href:"https://pcingola.github.io/SnpEff"},"SnpEff"),"."),(0,r.kt)("p",null,"Following definitions are used for the impact ratings as obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://github.com/pcingola/SnpEff/blob/master/src/docs/se_inputoutput.md#impact-prediction"},"SnpEff"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Definition"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"A non-disruptive variant that might change protein effectiveness.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Assumed to be mostly harmless or unlikely to change protein behavior.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.")))),(0,r.kt)("h2",{id:"sources"},"Sources"),(0,r.kt)("p",null,"Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP."),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"SnpEff ",(0,r.kt)("a",{parentName:"li",href:"https://pcingola.github.io/SnpEff/se_inputoutput/"},"Documentation")," and ",(0,r.kt)("a",{parentName:"li",href:"https://github.com/pcingola/SnpEff/blob/001b947893b616e3af082e6c565e253eef59db98/src/main/java/org/snpeff/snpEffect/EffectType.java#L54"},"Codebase")),(0,r.kt)("li",{parentName:"ol"},"VEP ",(0,r.kt)("a",{parentName:"li",href:"https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html"},"Documentation"))),(0,r.kt)("h2",{id:"consequence-impacts"},"Consequence Impacts"),(0,r.kt)("p",null,"Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Consequence"),(0,r.kt)("th",{parentName:"tr",align:null},"SnpEff Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"VEP Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Comment"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"bidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"coding_sequence_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low, modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on CDS")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_decrease"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_increase"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"downstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_elongation"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_truncation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"frameshift_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"incomplete_terminal_codon_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_insertion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"intron_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"mature_miRNA_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"missense_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"NMD_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_exon_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"protein_altering_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_ablation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_contraction"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_expansion"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_acceptor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_donor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate, low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on SPLICE_SITE_REGION in SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_gained"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"synonymous_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_ablation"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"unidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"upstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")))),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Note: ")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ol",{parentName:"div"},(0,r.kt)("li",{parentName:"ol"},"For transcripts with multiple consequences, the most severe impact rating is chosen."),(0,r.kt)("li",{parentName:"ol"},"In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides ",(0,r.kt)("inlineCode",{parentName:"li"},"modifier"),".")))),(0,r.kt)("h3",{id:"known-issues"},"Known Issues"),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"The consequence ",(0,r.kt)("inlineCode",{parentName:"p"},"splice_polypyrimidine_tract_variant"),", is rated as ",(0,r.kt)("inlineCode",{parentName:"p"},"low")," by VEP.\nHowever, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided."))),(0,r.kt)("h2",{id:"example-transcript"},"Example Transcript"),(0,r.kt)("p",null,"The key ",(0,r.kt)("inlineCode",{parentName:"p"},"impact")," for each transcript gives the impact rating for the ",(0,r.kt)("inlineCode",{parentName:"p"},"consequence"),"."),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json",metastring:"{20-24}","{20-24}":!0},'{\n "variants": [\n {\n "vid": "1-1623412-T-C",\n "chromosome": "1",\n "begin": 1623412,\n "end": 1623412,\n "refAllele": "T",\n "altAllele": "C",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.1623412T>C",\n "transcripts": [\n {\n "transcript": "ENST00000479659.5",\n "source": "Ensembl",\n "bioType": "lncRNA",\n "introns": "2/18",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "intron_variant",\n "non_coding_transcript_variant"\n ],\n "impact": "modifier",\n "hgvsc": "ENST00000479659.5:n.288-19T>C"\n },\n {\n "transcript": "ENST00000489635.5",\n "source": "VEP",\n "bioType": "mRNA",\n "codons": "aTg/aCg",\n "aminoAcids": "M/T",\n "cdnaPos": "269",\n "cdsPos": "134",\n "exons": "3/20",\n "proteinPos": "45",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "missense_variant"\n ],\n "impact": "moderate",\n "hgvsc": "ENST00000489635.5:c.134T>C",\n "hgvsp": "ENSP00000426007.1:p.(Met45Thr)",\n "proteinId": "ENSP00000426007.1"\n }\n ]\n }\n ]\n}\n')))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/75a3a2eb.258736c2.js b/assets/js/75a3a2eb.258736c2.js new file mode 100644 index 00000000..b6dd7649 --- /dev/null +++ b/assets/js/75a3a2eb.258736c2.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9767],{3905:(t,n,e)=>{e.d(n,{Zo:()=>d,kt:()=>N});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var m=a.createContext({}),o=function(t){var n=a.useContext(m),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},d=function(t){var n=o(t.components);return a.createElement(m.Provider,{value:n},t.children)},u="mdxType",g={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},k=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,m=t.parentName,d=p(t,["components","mdxType","originalType","parentName"]),u=o(e),k=r,N=u["".concat(m,".").concat(k)]||u[k]||g[k]||l;return e?a.createElement(N,i(i({ref:n},d),{},{components:e})):a.createElement(N,i({ref:n},d))}));function N(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=k;var p={};for(var m in n)hasOwnProperty.call(n,m)&&(p[m]=n[m]);p.originalType=t,p[u]="string"==typeof t?t:r,i[1]=p;for(var o=2;o{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>p,toc:()=>m});var a=e(7462),r=(e(7294),e(3905));const l={title:"Transcript Consequence Impact"},i=void 0,p={unversionedId:"core-functionality/transcript-consequence-impacts",id:"core-functionality/transcript-consequence-impacts",title:"Transcript Consequence Impact",description:"Overview",source:"@site/docs/core-functionality/transcript-consequence-impacts.md",sourceDirName:"core-functionality",slug:"/core-functionality/transcript-consequence-impacts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/transcript-consequence-impacts.md",tags:[],version:"current",frontMatter:{title:"Transcript Consequence Impact"},sidebar:"docs",previous:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving"},next:{title:"Variant IDs",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids"}},m=[{value:"Overview",id:"overview",children:[],level:2},{value:"Sources",id:"sources",children:[],level:2},{value:"Consequence Impacts",id:"consequence-impacts",children:[{value:"Known Issues",id:"known-issues",children:[],level:3}],level:2},{value:"Example Transcript",id:"example-transcript",children:[],level:2}],o={toc:m},d="wrapper";function u(t){let{components:n,...e}=t;return(0,r.kt)(d,(0,a.Z)({},o,e,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Illumina Connected Annotations provides transcript consequence impacts from ",(0,r.kt)("a",{parentName:"p",href:"https://pcingola.github.io/SnpEff"},"SnpEff"),"."),(0,r.kt)("p",null,"Following definitions are used for the impact ratings as obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://github.com/pcingola/SnpEff/blob/master/src/docs/se_inputoutput.md#impact-prediction"},"SnpEff"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Definition"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"A non-disruptive variant that might change protein effectiveness.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Assumed to be mostly harmless or unlikely to change protein behavior.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.")))),(0,r.kt)("h2",{id:"sources"},"Sources"),(0,r.kt)("p",null,"Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP."),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"SnpEff ",(0,r.kt)("a",{parentName:"li",href:"https://pcingola.github.io/SnpEff/se_inputoutput/"},"Documentation")," and ",(0,r.kt)("a",{parentName:"li",href:"https://github.com/pcingola/SnpEff/blob/001b947893b616e3af082e6c565e253eef59db98/src/main/java/org/snpeff/snpEffect/EffectType.java#L54"},"Codebase")),(0,r.kt)("li",{parentName:"ol"},"VEP ",(0,r.kt)("a",{parentName:"li",href:"https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html"},"Documentation"))),(0,r.kt)("h2",{id:"consequence-impacts"},"Consequence Impacts"),(0,r.kt)("p",null,"Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Consequence"),(0,r.kt)("th",{parentName:"tr",align:null},"SnpEff Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"VEP Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Comment"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"bidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"coding_sequence_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low, modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on CDS")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_decrease"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_increase"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"downstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_elongation"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_truncation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"frameshift_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"incomplete_terminal_codon_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_insertion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"intron_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"mature_miRNA_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"missense_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"NMD_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_exon_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"protein_altering_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_ablation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_contraction"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_expansion"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_acceptor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_donor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate, low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on SPLICE_SITE_REGION in SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_gained"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"synonymous_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_ablation"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"unidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"upstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")))),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Note: ")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ol",{parentName:"div"},(0,r.kt)("li",{parentName:"ol"},"For transcripts with multiple consequences, the most severe impact rating is chosen."),(0,r.kt)("li",{parentName:"ol"},"In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides ",(0,r.kt)("inlineCode",{parentName:"li"},"modifier"),".")))),(0,r.kt)("h3",{id:"known-issues"},"Known Issues"),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"The consequence ",(0,r.kt)("inlineCode",{parentName:"p"},"splice_polypyrimidine_tract_variant"),", is rated as ",(0,r.kt)("inlineCode",{parentName:"p"},"low")," by VEP.\nHowever, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided."))),(0,r.kt)("h2",{id:"example-transcript"},"Example Transcript"),(0,r.kt)("p",null,"The key ",(0,r.kt)("inlineCode",{parentName:"p"},"impact")," for each transcript gives the impact rating for the ",(0,r.kt)("inlineCode",{parentName:"p"},"consequence"),"."),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json",metastring:"{20-24}","{20-24}":!0},'{\n "variants": [\n {\n "vid": "1-1623412-T-C",\n "chromosome": "1",\n "begin": 1623412,\n "end": 1623412,\n "refAllele": "T",\n "altAllele": "C",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.1623412T>C",\n "transcripts": [\n {\n "transcript": "ENST00000479659.5",\n "source": "Ensembl",\n "bioType": "lncRNA",\n "introns": "2/18",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "intron_variant",\n "non_coding_transcript_variant"\n ],\n "impact": "modifier",\n "hgvsc": "ENST00000479659.5:n.288-19T>C"\n },\n {\n "transcript": "ENST00000489635.5",\n "source": "VEP",\n "bioType": "mRNA",\n "codons": "aTg/aCg",\n "aminoAcids": "M/T",\n "cdnaPos": "269",\n "cdsPos": "134",\n "exons": "3/20",\n "proteinPos": "45",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "missense_variant"\n ],\n "impact": "moderate",\n "hgvsc": "ENST00000489635.5:c.134T>C",\n "hgvsp": "ENSP00000426007.1:p.(Met45Thr)",\n "proteinId": "ENSP00000426007.1"\n }\n ]\n }\n ]\n}\n')))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/76a0dc22.1402df31.js b/assets/js/76a0dc22.1402df31.js new file mode 100644 index 00000000..8ba2060d --- /dev/null +++ b/assets/js/76a0dc22.1402df31.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9521,4335],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},c=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,s=e.parentName,c=l(e,["components","mdxType","originalType","parentName"]),p=d(n),u=r,v=p["".concat(s,".").concat(u)]||p[u]||m[u]||o;return n?a.createElement(v,i(i({ref:t},c),{},{components:n})):a.createElement(v,i({ref:t},c))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[p]="string"==typeof e?e:r,i[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>p,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/topmed-json",id:"version-3.24/data-sources/topmed-json",title:"topmed-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/topmed-json.md",sourceDirName:"data-sources",slug:"/data-sources/topmed-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/topmed-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},c="wrapper";function p(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"topmed":{ \n "allAc":20,\n "allAn":125568,\n "allAf":0.000159,\n "allHc":0,\n "failedFilter":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele number. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed allele frequency (computed by Illumina Connected Annotations)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allHc"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed homozygous count")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,r.kt)("td",{parentName:"tr",align:null},"bool"),(0,r.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}p.isMDXComponent=!0},1561:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>m,frontMatter:()=>i,metadata:()=>s,toc:()=>d});var a=n(7462),r=(n(7294),n(3905)),o=n(5023);const i={title:"TOPMed"},l=void 0,s={unversionedId:"data-sources/topmed",id:"version-3.24/data-sources/topmed",title:"TOPMed",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/topmed.mdx",sourceDirName:"data-sources",slug:"/data-sources/topmed",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/topmed.mdx",tags:[],version:"3.24",frontMatter:{title:"TOPMed"},sidebar:"docs",previous:{title:"Splice AI",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai"},next:{title:"Illumina Connected Annotations JSON File Format",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-json-file-format"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"VCF extraction",id:"vcf-extraction",children:[],level:2},{value:"GRCh37 liftover",id:"grch37-liftover",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON output",id:"json-output",children:[],level:2}],c={toc:d},p="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"The ",(0,r.kt)("a",{parentName:"p",href:"https://www.nhlbi.nih.gov/science/trans-omics-precision-medicine-topmed-program"},"Trans-Omics for Precision Medicine")," (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual\u2019s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. ",(0,r.kt)("em",{parentName:"p"},"PLoS genetics"),", ",(0,r.kt)("strong",{parentName:"p"},"15(12)"),", p.e1008500."))),(0,r.kt)("h2",{id:"vcf-extraction"},"VCF extraction"),(0,r.kt)("p",null,"We currently extract the following fields from TOPMed VCF file:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},'##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n')),(0,r.kt)("p",null,"Example:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 10132 TOPMed_freeze_5?chr1:10,132 T C 255 SVM VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0 NA:FRQ 125568:0.000254842\n")),(0,r.kt)("h2",{id:"grch37-liftover"},"GRCh37 liftover"),(0,r.kt)("p",null,"The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids."),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://bravo.sph.umich.edu/freeze5/hg38/download"},"https://bravo.sph.umich.edu/freeze5/hg38/download")),(0,r.kt)("h2",{id:"json-output"},"JSON output"),(0,r.kt)(o.default,{mdxType:"JSON"}))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/78bb3c84.7fa47463.js b/assets/js/78bb3c84.7fa47463.js new file mode 100644 index 00000000..a38a45b8 --- /dev/null +++ b/assets/js/78bb3c84.7fa47463.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2959,958,5261,7911],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>N});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function l(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),m=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},c=function(e){var t=m(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",p={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,c=o(e,["components","mdxType","originalType","parentName"]),d=m(n),u=i,N=d["".concat(s,".").concat(u)]||d[u]||p[u]||r;return n?a.createElement(N,l(l({ref:t},c),{},{components:n})):a.createElement(N,l({ref:t},c))}));function N(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,l=new Array(r);l[0]=u;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[d]="string"==typeof e?e:i,l[1]=o;for(var m=2;m{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>d,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},l=void 0,o={unversionedId:"data-sources/cosmic-cancer-gene-census",id:"version-3.24/data-sources/cosmic-cancer-gene-census",title:"cosmic-cancer-gene-census",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-cancer-gene-census",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-cancer-gene-census",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-cancer-gene-census.md",tags:[],version:"3.24",frontMatter:{}},s=[],m={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,i.kt)(c,(0,a.Z)({},m,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},' {\n "name": "PRDM16",\n "ensemblGeneId": "ENSG00000142611",\n "ncbiGeneId": "63976",\n "hgncId": 14000,\n "cosmic": {\n "tier": 1,\n "roleInCancer": [\n "oncogene",\n "fusion"\n ]\n }\n}\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"roleInCancer"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Possible roles in caner")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"tier"),(0,i.kt)("td",{parentName:"tr",align:"center"},"number"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Cosmic tiers ","[1, 2]")))))}d.isMDXComponent=!0},9842:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>d,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},l=void 0,o={unversionedId:"data-sources/cosmic-gene-fusion-json",id:"version-3.24/data-sources/cosmic-gene-fusion-json",title:"cosmic-gene-fusion-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-gene-fusion-json.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-gene-fusion-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-gene-fusion-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-gene-fusion-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],m={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,i.kt)(c,(0,a.Z)({},m,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},' "cosmicGeneFusions":[\n {\n "id":"COSF881",\n "numSamples":6,\n "geneSymbols":[\n "MYB",\n "NFIB"\n ],\n "hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",\n "histologies":[\n {\n "name":"adenoid cystic carcinoma",\n "numSamples":6\n }\n ],\n "sites":[\n {\n "name":"salivary gland (submandibular)",\n "numSamples":1\n },\n {\n "name":"salivary gland (parotid)",\n "numSamples":1\n },\n {\n "name":"salivary gland (nasal cavity)",\n "numSamples":1\n },\n {\n "name":"breast",\n "numSamples":3\n }\n ],\n "pubMedIds":[\n 19841262\n ]\n }\n ]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"id"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"COSMIC fusion ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"geneSymbols"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"5' gene & 3' gene")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgvsr"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"HGVS RNA translocation fusion notation")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"histologies"),(0,i.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"phenotypic descriptions")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"sites"),(0,i.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"tissue types")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Count")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"name"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"description")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"})))))}d.isMDXComponent=!0},8355:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>d,frontMatter:()=>r,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},l=void 0,o={unversionedId:"data-sources/cosmic-json",id:"version-3.24/data-sources/cosmic-json",title:"cosmic-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-json.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],m={toc:s},c="wrapper";function d(e){let{components:t,...n}=e;return(0,i.kt)(c,(0,a.Z)({},m,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "id":"COSV58272668",\n "numSamples":8,\n "refAllele":"-",\n "altAllele":"CCT",\n "histologies":[\n {\n "name":"carcinoma (serous carcinoma)",\n "numSamples":2\n },\n {\n "name":"meningioma (fibroblastic)",\n "numSamples":1\n },\n {\n "name":"carcinoma",\n "numSamples":1\n },\n {\n "name":"carcinoma (squamous cell carcinoma)",\n "numSamples":1\n },\n {\n "name":"meningioma (transitional)",\n "numSamples":1\n },\n {\n "name":"carcinoma (adenocarcinoma)",\n "numSamples":1\n },\n {\n "name":"other (neoplasm)",\n "numSamples":1\n }\n ],\n "sites":[\n {\n "name":"ovary",\n "numSamples":2\n },\n {\n "name":"meninges",\n "numSamples":2\n },\n {\n "name":"thyroid",\n "numSamples":2\n },\n {\n "name":"cervix",\n "numSamples":1\n },\n {\n "name":"large intestine (colon)",\n "numSamples":1\n }\n ],\n "pubMedIds":[\n 25738363,\n 27548314\n ],\n "confirmedSomatic":true,\n "drugResistance":true, /* not in this particular COSMIC variant */\n "isAlleleSpecific":true\n}\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"id"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"COSMIC Genomic Mutation ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"histologies"),(0,i.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"phenotypic descriptions")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"sites"),(0,i.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"tissue types")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"confirmedSomatic"),(0,i.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the variant is a confirmed somatic variant")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"drugResistance"),(0,i.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the variant has been associated with drug resistance")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Count")),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"name"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"description")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"})))))}d.isMDXComponent=!0},1263:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>m,default:()=>N,frontMatter:()=>s,metadata:()=>c,toc:()=>d});var a=n(7462),i=(n(7294),n(3905)),r=n(8355),l=n(9842),o=n(13);const s={title:"COSMIC"},m=void 0,c={unversionedId:"data-sources/cosmic",id:"version-3.24/data-sources/cosmic",title:"COSMIC",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/cosmic.mdx",sourceDirName:"data-sources",slug:"/data-sources/cosmic",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic.mdx",tags:[],version:"3.24",frontMatter:{title:"COSMIC"},sidebar:"docs",previous:{title:"ClinVar Preview",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview"},next:{title:"DANN",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"Small Variants",id:"small-variants",children:[{value:"VCF extraction",id:"vcf-extraction",children:[{value:"Example",id:"example",children:[],level:4},{value:"Parsing",id:"parsing",children:[],level:4}],level:3},{value:"TSV extraction",id:"tsv-extraction",children:[{value:"Example",id:"example-1",children:[],level:4},{value:"Parsing",id:"parsing-1",children:[],level:4},{value:"Parsing",id:"parsing-2",children:[],level:4},{value:"Aggregating Histologies & Sites",id:"aggregating-histologies--sites",children:[],level:4}],level:3},{value:"Download URL",id:"download-url",children:[{value:"GRCh37",id:"grch37",children:[],level:4},{value:"GRCh38",id:"grch38",children:[],level:4}],level:3},{value:"JSON Output",id:"json-output",children:[],level:3}],level:2},{value:"Gene Fusions",id:"gene-fusions",children:[{value:"TSV extraction",id:"tsv-extraction-1",children:[{value:"Example",id:"example-2",children:[],level:4},{value:"Parsing",id:"parsing-3",children:[],level:4},{value:"Parsing",id:"parsing-4",children:[],level:4},{value:"Aggregating Histologies & Sites",id:"aggregating-histologies--sites-1",children:[],level:4}],level:3},{value:"Known Issues",id:"known-issues",children:[],level:3},{value:"Download URL",id:"download-url-1",children:[{value:"GRCh37",id:"grch37-1",children:[],level:4},{value:"GRCh38",id:"grch38-1",children:[],level:4}],level:3},{value:"JSON Output",id:"json-output-1",children:[],level:3}],level:2},{value:"Cancer Gene Census",id:"cancer-gene-census",children:[{value:"TSV Extraction",id:"tsv-extraction-2",children:[{value:"Example",id:"example-3",children:[],level:4},{value:"Parsing",id:"parsing-5",children:[{value:"Columns",id:"columns",children:[],level:5},{value:"Possible Roles in Cancer",id:"possible-roles-in-cancer",children:[],level:5}],level:4}],level:3},{value:"CSV Extraction",id:"csv-extraction",children:[{value:"Columns",id:"columns-1",children:[],level:5}],level:3},{value:"Known Issues",id:"known-issues-1",children:[],level:3},{value:"Download URL",id:"download-url-2",children:[],level:3},{value:"JSON output",id:"json-output-2",children:[],level:3}],level:2},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[],level:2}],p={toc:d},u="wrapper";function N(e){let{components:t,...n}=e;return(0,i.kt)(u,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human\ncancers."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"John G Tate, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, Charlotte G Cole, Celestino Creatore, Elisabeth Dawson,\nPeter Fish, Bhavana Harsha, Charlie Hathaway, Steve C Jupe, Chai Yin Kok, Kate Noble, Laura Ponting, Christopher C Ramshaw, Claire E Rye, Helen E Speedy, Ray\nStefancsik, Sam L Thompson, Shicai Wang, Sari Ward, Peter J Campbell, Simon A Forbes. (2019) ",(0,i.kt)("a",{parentName:"p",href:"https://academic.oup.com/nar/article/47/D1/D941/5146192"},"COSMIC: the Catalogue Of Somatic Mutations In\nCancer"),", ",(0,i.kt)("em",{parentName:"p"},"Nucleic Acids Research"),", Volume 47, Issue D1"))),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Professional data source")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This is a Professional data source and is not available freely. Please contact ",(0,i.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," if you would like to obtain it."))),(0,i.kt)("h2",{id:"small-variants"},"Small Variants"),(0,i.kt)("p",null,"Our main COSMIC deliverable provides annotations for both coding and non-coding variants throughout the genome. As of COSMIC v96, this includes 28.7M variants\nspanning the human genome. Illumina Connected Annotations currently parses four files to extract the relevant content:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"CosmicCodingMuts.vcf.gz"),(0,i.kt)("li",{parentName:"ul"},"CosmicNonCodingVariants.vcf.gz"),(0,i.kt)("li",{parentName:"ul"},"CosmicMutantExport.tsv.gz"),(0,i.kt)("li",{parentName:"ul"},"CosmicNCV.tsv.gz")),(0,i.kt)("h3",{id:"vcf-extraction"},"VCF extraction"),(0,i.kt)("h4",{id:"example"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO\n1 65797 COSV58737189 T C . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN23957695;CDS=c.9+224T>C;AA=p.?;HGVSC=ENST00000641515.2:c.9+224T>C;HGVSG=1:g.65797T>C;CNT=1\n")),(0,i.kt)("h4",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"From the VCF files, we're mainly interested in the following columns:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"CHROM")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"POS")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"ID")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"REF")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"ALT"))),(0,i.kt)("h3",{id:"tsv-extraction"},"TSV extraction"),(0,i.kt)("h4",{id:"example-1"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"Gene name Accession Number Gene CDS length HGNC ID Sample name ID_sample ID_tumour Primary site Site subtype 1 Site subtype 2 Site subtype 3 Primary histology Histology subtype 1 Histology subtype 2 Histology subtype 3 Genome-wide screen GENOMIC_MUTATION_ID LEGACY_MUTATION_ID MUTATION_ID Mutation CDS Mutation AA Mutation Description Mutation zygosity LOH GRCh Mutation genome position Mutation strand Resistance Mutation Mutation somatic status Pubmed_PMID ID_STUDY Sample Type Tumour origin Age HGVSP HGVSC HGVSG\nMCF2L_ENST00000375604 ENST00000375604.6 3372 14576 RK091_C01 1918867 1806188 liver NS NS NS carcinoma NS NS NS y COSV65049364 COSN1601909 113108365 c.73+3096A>G p.? Unknown het 38 13:113005079-113005079 + - Variant of unknown origin 322 fresh/frozen - NOS primary ENST00000375604.6:c.73+3096A>G 13:g.113005079A>G\n")),(0,i.kt)("h4",{id:"parsing-1"},"Parsing"),(0,i.kt)("p",null,"From the TSV file, we're mainly interested in the following columns:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"GENOMIC_MUTATION_ID")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"ID_sample")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Primary site")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Site subtype 1")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Primary histology")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Histology subtype 1")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Pubmed_PMID")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Resistance Mutation")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Mutation somatic status"))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"info")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"For all the histologies and sites, we replace all the underlines with spaces. ",(0,i.kt)("inlineCode",{parentName:"p"},"salivary_gland")," would become ",(0,i.kt)("inlineCode",{parentName:"p"},"salivary gland"),"."))),(0,i.kt)("h4",{id:"parsing-2"},"Parsing"),(0,i.kt)("p",null,"To aggregate the data in Illumina Connected Annotations, we perform the following:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Parse the coding and non-coding TSV files to retrieve the histologies, sites, PubMed IDs, somatic status, and resistance mutation status. Histologies and sites\nare tracked with respect to sample IDs."),(0,i.kt)("li",{parentName:"ul"},"Parse the coding and non-coding VCF files to retrieve the genomic variant for each entry")),(0,i.kt)("h4",{id:"aggregating-histologies--sites"},"Aggregating Histologies & Sites"),(0,i.kt)("p",null,"For sites and histologies, we observe that the subtype provides additional description but is still dependent on the primary site value. For example, the primary\nsite might be ",(0,i.kt)("inlineCode",{parentName:"p"},"skin"),", but the subtype is ",(0,i.kt)("inlineCode",{parentName:"p"},"foot"),". Therefore, we will combine the values in the following manner: ",(0,i.kt)("inlineCode",{parentName:"p"},"skin (foot)"),". "),(0,i.kt)("p",null,"COSMIC uses ",(0,i.kt)("inlineCode",{parentName:"p"},"NS")," to show that a value is empty. If the subtype is ",(0,i.kt)("inlineCode",{parentName:"p"},"NS"),", we will use the primary histology instead."),(0,i.kt)("h3",{id:"download-url"},"Download URL"),(0,i.kt)("h4",{id:"grch37"},"GRCh37"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v96/VCF/CosmicCodingMuts.vcf.gz"},"CosmicCodingMuts.vcf.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v96/VCF/CosmicNonCodingVariants.vcf.gz"},"CosmicNonCodingVariants.vcf.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v96/CosmicMutantExport.tsv.gz"},"CosmicMutantExport.tsv.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v96/CosmicNCV.tsv.gz"},"CosmicNCV.tsv.gz"))),(0,i.kt)("h4",{id:"grch38"},"GRCh38"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v96/VCF/CosmicCodingMuts.vcf.gz"},"CosmicCodingMuts.vcf.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v96/VCF/CosmicNonCodingVariants.vcf.gz"},"CosmicNonCodingVariants.vcf.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v96/CosmicMutantExport.tsv.gz"},"CosmicMutantExport.tsv.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v96/CosmicNCV.tsv.gz"},"CosmicNCV.tsv.gz"))),(0,i.kt)("h3",{id:"json-output"},"JSON Output"),(0,i.kt)(r.default,{mdxType:"SmallVariantJSON"}),(0,i.kt)("h2",{id:"gene-fusions"},"Gene Fusions"),(0,i.kt)("p",null,"Gene fusions are manually curated from peer reviewed publications by expert COSMIC curators. A comprehensive literature curation is completed for each fusion\npair when it is released in the database. Currently COSMIC includes information on fusions involved in solid tumours and leukaemias."),(0,i.kt)("h3",{id:"tsv-extraction-1"},"TSV extraction"),(0,i.kt)("h4",{id:"example-2"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"SAMPLE_ID SAMPLE_NAME PRIMARY_SITE SITE_SUBTYPE_1 SITE_SUBTYPE_2 SITE_SUBTYPE_3 PRIMARY_HISTOLOGY HISTOLOGY_SUBTYPE_1 HISTOLOGY_SUBTYPE_2 HISTOLOGY_SUBTYPE_3 FUSION_ID TRANSLOCATION_NAME 5'_CHROMOSOME 5'_STRAND 5'_GENE_ID 5'_GENE_NAME 5'_LAST_OBSERVED_EXON 5'_GENOME_START_FROM 5'_GENOME_START_TO 5'_GENOME_STOP_FROM 5'_GENOME_STOP_TO 3'_CHROMOSOME 3'_STRAND 3'_GENE_ID 3'_GENE_NAME 3'_FIRST_OBSERVED_EXON 3'_GENOME_START_FROM 3'_GENOME_START_TO 3'_GENOME_STOP_FROM 3'_GENOME_STOP_TO FUSION_TYPE PUBMED_PMID\n749711 HCC1187 breast NS NS NS carcinoma ductal_carcinoma NS NS 665 ENST00000360863.10(RGS22):r.1_3555::ENST00000369518.1(SYCP1):r.2100_3452 8 - 197199 RGS22 22 99981937 99981937 100106116 100106116 1 + 212470 SYCP1_ENST00000369518 24 114944339 114944339 114995367 114995367 Inferred Breakpoint 20033038\n")),(0,i.kt)("h4",{id:"parsing-3"},"Parsing"),(0,i.kt)("p",null,"From the TSV file, we're mainly interested in the following columns:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"SAMPLE_ID")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"PRIMARY_SITE")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"PRIMARY_HISTOLOGY")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"HISTOLOGY_SUBTYPE_1")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"FUSION_ID")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"TRANSLOCATION_NAME")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"PUBMED_PMID"))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"info")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"For all the histologies and sites, we replace all the underlines with spaces. ",(0,i.kt)("inlineCode",{parentName:"p"},"salivary_gland")," would become ",(0,i.kt)("inlineCode",{parentName:"p"},"salivary gland"),"."))),(0,i.kt)("h4",{id:"parsing-4"},"Parsing"),(0,i.kt)("p",null,"To create the gene fusion entries in Illumina Connected Annotations, we perform the following on each row in the TSV file:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Group all entries by FUSION_ID"),(0,i.kt)("li",{parentName:"ul"},"Using all the entries related to this FUSION_ID:",(0,i.kt)("ul",{parentName:"li"},(0,i.kt)("li",{parentName:"ul"},"Collect all the PubMed IDs"),(0,i.kt)("li",{parentName:"ul"},"Tally the number of observed sample IDs"),(0,i.kt)("li",{parentName:"ul"},"Grab the HGVS r. notation (should not change throughout the FUSION_ID)"),(0,i.kt)("li",{parentName:"ul"},"Tally the number of samples observed for each histology"),(0,i.kt)("li",{parentName:"ul"},"Tally the number of samples observed for each site"))),(0,i.kt)("li",{parentName:"ul"},"Extract the transcript IDs from the HGVS notation and lookup the associated gene symbols")),(0,i.kt)("h4",{id:"aggregating-histologies--sites-1"},"Aggregating Histologies & Sites"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"#aggregating-histologies--sites"},"Aggregating Histologies & Sites")," was previously described in the small variants section."),(0,i.kt)("h3",{id:"known-issues"},"Known Issues"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"There are some issues with the HGVS RNA notation:"),(0,i.kt)("ul",{parentName:"div"},(0,i.kt)("li",{parentName:"ul"},"For coding transcripts, HGVS numbering should use CDS coordinates. Right now COSMIC is using cDNA coordinates for all their fusions.")))),(0,i.kt)("h3",{id:"download-url-1"},"Download URL"),(0,i.kt)("h4",{id:"grch37-1"},"GRCh37"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v96/CosmicFusionExport.tsv.gz"},"CosmicFusionExport.tsv.gz"))),(0,i.kt)("h4",{id:"grch38-1"},"GRCh38"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v96/CosmicFusionExport.tsv.gz"},"CosmicFusionExport.tsv.gz"))),(0,i.kt)("h3",{id:"json-output-1"},"JSON Output"),(0,i.kt)(l.default,{mdxType:"GeneFusionJSON"}),(0,i.kt)("h2",{id:"cancer-gene-census"},"Cancer Gene Census"),(0,i.kt)("h3",{id:"tsv-extraction-2"},"TSV Extraction"),(0,i.kt)("h4",{id:"example-3"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"GENE_NAME CELL_TYPE PUBMED_PMID HALLMARK IMPACT DESCRIPTION CELL_LINE\nPRDM16 18496560 role in cancer oncogene oncogene\nPRDM16 16015645 role in cancer fusion fusion\n")),(0,i.kt)("h4",{id:"parsing-5"},"Parsing"),(0,i.kt)("p",null,'To extract information about TSGs and oncogenes, the data based on the "role in cancer" attribute is filtered.\nFor tumor suppressor genes, rows with the value "TSG" and for oncogenes, rows with the value "oncogene" are filtered.\nSome genes have both "TSG/oncogene" as their role, which indicates that they can act as both.'),(0,i.kt)("h5",{id:"columns"},"Columns"),(0,i.kt)("p",null,"Only following columns are needed to gather required roles in cancer:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"GENE_NAME")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"IMPACT")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"HALLMARK"))),(0,i.kt)("h5",{id:"possible-roles-in-cancer"},"Possible Roles in Cancer"),(0,i.kt)("p",null,"The file contained following number of instances for each role type"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Role in cancer"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Total Instances"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"fusion"),(0,i.kt)("td",{parentName:"tr",align:"center"},"149")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"TSG"),(0,i.kt)("td",{parentName:"tr",align:"center"},"195")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"oncogene"),(0,i.kt)("td",{parentName:"tr",align:"center"},"181")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"Total"),(0,i.kt)("td",{parentName:"tr",align:"center"},"525")))),(0,i.kt)("h3",{id:"csv-extraction"},"CSV Extraction"),(0,i.kt)("p",null,"COSMIC Tiers are extracted from ",(0,i.kt)("inlineCode",{parentName:"p"},"cancer_gene_census.csv")," file:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},'Gene Symbol,Name,Entrez GeneId,Genome Location,Tier,Hallmark,Chr Band,Somatic,Germline,Tumour Types(Somatic),Tumour Types(Germline),Cancer Syndrome,Tissue Type,Molecular Genetics,Role in Cancer,Mutation Types,Translocation Partner,Other Germline Mut,Other Syndrome,COSMIC ID,cosmic gene name,Synonyms\n"AR","Androgen Receptor ","367","X:67544036-67730619","1","Yes","Xq12","yes","yes","prostate","","","E","Dom","oncogene","Mis","","yes ","Androgen insensitivity, Hypospadias 1, X-linked, Spinal and bulbar muscular atrophy of Kennedy ","COSG292497","AR","367,AIS,AR,DHTR,ENSG00000169083.16,HUMARA,NR3C4,P10275,SBMA,SMAX1"\n"FH","fumarate hydratase","2271","1:241497603-241519761","1","","1q43","","yes","","leiomyomatosis, renal","hereditary leiomyomatosis and renal cell cancer","E, M","Rec","TSG","Mis, N, F","","","","COSG255037","FH","2271,ENSG00000091483.6,FH,P07954"\n"ALK","anaplastic lymphoma kinase (Ki-1)","238","2:29192774-29921566","1","Yes","2p23.2","yes","yes","ALCL, NSCLC, neuroblastoma, inflammatory myofibroblastic tumour, Spitzoid tumour","neuroblastoma","familial neuroblastoma","L, E, M","Dom","oncogene, fusion","T, Mis, A","NPM1, TPM3, TFG, TPM4, ATIC, CLTC, MSN, RNF213, CARS, EML4, KIF5B, C2orf22, DCTN1, HIP1, TPR, RANBP2, PPFIBP1, SEC31A, STRN, VCL, C2orf44, KLC1","","","COSG383409","ALK","238,ALK,CD246,ENSG00000171094.17,Q9UM73"\n"APC","adenomatous polyposis of the colon gene","324","5:112737888-112846239","1","Yes","5q22.2","yes","yes","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","adenomatous polyposis coli; Turcot syndrome","E, M, O","Rec","TSG","D, Mis, N, F, S","","","","COSG208824","APC","324,APC,DP2,DP2.5,DP3,ENSG00000134982.16,P25054,PPP1R46"\n')),(0,i.kt)("h5",{id:"columns-1"},"Columns"),(0,i.kt)("p",null,"Only following columns are needed to gather required roles in cancer:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Gene Symbol")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Tier"))),(0,i.kt)("p",null,"First the tiers are found from the CSV; based on gene symbols, the tiers' information is added while parsing through the TSV"),(0,i.kt)("h3",{id:"known-issues-1"},"Known Issues"),(0,i.kt)("p",null,"None"),(0,i.kt)("h3",{id:"download-url-2"},"Download URL"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v99/Cancer_Gene_Census_Hallmarks_Of_Cancer.tsv.gz"},"Cancer_Gene_Census_Hallmarks_Of_Cancer.tsv.gz")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v99/cancer_gene_census.csv"},"cancer_gene_census.csv"))),(0,i.kt)("h3",{id:"json-output-2"},"JSON output"),(0,i.kt)(o.default,{mdxType:"CancerGeneCensusJSON"}),(0,i.kt)("h2",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,i.kt)("p",null,"You can generate COSMIC supplementary annotation files if you have COSMIC account credentials. Please refer to SAUtils section for more details."))}N.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/79308b2f.fd9af8d3.js b/assets/js/79308b2f.fd9af8d3.js new file mode 100644 index 00000000..2120bc5c --- /dev/null +++ b/assets/js/79308b2f.fd9af8d3.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5261],{3905:(t,e,n)=>{n.d(e,{Zo:()=>p,kt:()=>f});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function i(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function o(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var m=a.createContext({}),c=function(t){var e=a.useContext(m),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},p=function(t){var e=c(t.components);return a.createElement(m.Provider,{value:e},t.children)},s="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},u=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,i=t.originalType,m=t.parentName,p=l(t,["components","mdxType","originalType","parentName"]),s=c(n),u=r,f=s["".concat(m,".").concat(u)]||s[u]||d[u]||i;return n?a.createElement(f,o(o({ref:e},p),{},{components:n})):a.createElement(f,o({ref:e},p))}));function f(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var i=n.length,o=new Array(i);o[0]=u;var l={};for(var m in e)hasOwnProperty.call(e,m)&&(l[m]=e[m]);l.originalType=t,l[s]="string"==typeof t?t:r,o[1]=l;for(var c=2;c{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>s,frontMatter:()=>i,metadata:()=>l,toc:()=>m});var a=n(7462),r=(n(7294),n(3905));const i={},o=void 0,l={unversionedId:"data-sources/cosmic-json",id:"version-3.24/data-sources/cosmic-json",title:"cosmic-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-json.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],c={toc:m},p="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(p,(0,a.Z)({},c,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'{\n "id":"COSV58272668",\n "numSamples":8,\n "refAllele":"-",\n "altAllele":"CCT",\n "histologies":[\n {\n "name":"carcinoma (serous carcinoma)",\n "numSamples":2\n },\n {\n "name":"meningioma (fibroblastic)",\n "numSamples":1\n },\n {\n "name":"carcinoma",\n "numSamples":1\n },\n {\n "name":"carcinoma (squamous cell carcinoma)",\n "numSamples":1\n },\n {\n "name":"meningioma (transitional)",\n "numSamples":1\n },\n {\n "name":"carcinoma (adenocarcinoma)",\n "numSamples":1\n },\n {\n "name":"other (neoplasm)",\n "numSamples":1\n }\n ],\n "sites":[\n {\n "name":"ovary",\n "numSamples":2\n },\n {\n "name":"meninges",\n "numSamples":2\n },\n {\n "name":"thyroid",\n "numSamples":2\n },\n {\n "name":"cervix",\n "numSamples":1\n },\n {\n "name":"large intestine (colon)",\n "numSamples":1\n }\n ],\n "pubMedIds":[\n 25738363,\n 27548314\n ],\n "confirmedSomatic":true,\n "drugResistance":true, /* not in this particular COSMIC variant */\n "isAlleleSpecific":true\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"id"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"COSMIC Genomic Mutation ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"histologies"),(0,r.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotypic descriptions")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sites"),(0,r.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"tissue types")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"confirmedSomatic"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the variant is a confirmed somatic variant")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"drugResistance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the variant has been associated with drug resistance")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Count")),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"name"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"description")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"})))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/83fc027c.ab43b5a9.js b/assets/js/83fc027c.ab43b5a9.js new file mode 100644 index 00000000..51eb2581 --- /dev/null +++ b/assets/js/83fc027c.ab43b5a9.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4241],{3905:(t,e,n)=>{n.d(e,{Zo:()=>c,kt:()=>g});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function o(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),u=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},c=function(t){var e=u(t.components);return a.createElement(p.Provider,{value:e},t.children)},s="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},m=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,c=i(t,["components","mdxType","originalType","parentName"]),s=u(n),m=r,g=s["".concat(p,".").concat(m)]||s[m]||d[m]||l;return n?a.createElement(g,o(o({ref:e},c),{},{components:n})):a.createElement(g,o({ref:e},c))}));function g(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,o=new Array(l);o[0]=m;var i={};for(var p in e)hasOwnProperty.call(e,p)&&(i[p]=e[p]);i.originalType=t,i[s]="string"==typeof t?t:r,o[1]=i;for(var u=2;u{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>s,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/clingen-json",id:"version-3.24/data-sources/clingen-json",title:"clingen-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},c="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(c,(0,a.Z)({},u,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clingen":[\n {\n "chromosome":"17",\n "begin":525,\n "end":14667519,\n "variantType":"copy_number_gain",\n "id":"nsv996083",\n "clinicalInterpretation":"pathogenic",\n "observedGains":1,\n "validated":true,\n "phenotypes":[\n "Intrauterine growth retardation"\n ],\n "phenotypeIds":[\n "HP:0001511",\n "MedGen:C1853481"\n ],\n "reciprocalOverlap":0.00131\n },\n {\n "chromosome":"17",\n "begin":45835,\n "end":7600330,\n "variantType":"copy_number_loss",\n "id":"nsv869419",\n "clinicalInterpretation":"pathogenic",\n "observedLosses":1,\n "validated":true,\n "phenotypes":[\n "Developmental delay AND/OR other significant developmental or morphological phenotypes"\n ],\n "reciprocalOverlap":0.00254\n }\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clingen"),(0,r.kt)("td",{parentName:"tr",align:null},"object array"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Any of the\xa0sequence alterations defined here.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"id"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"Identifier from the data source. Alternatively a VID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"clinicalInterpretation"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"observedGains"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,r.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"observedLosses"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,r.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"validated"),(0,r.kt)("td",{parentName:"tr",align:null},"boolean"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:null},"string array"),(0,r.kt)("td",{parentName:"tr",align:null},"Description of the phenotype.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"phenotypeIds"),(0,r.kt)("td",{parentName:"tr",align:null},"string array"),(0,r.kt)("td",{parentName:"tr",align:null},"Description of the phenotype IDs.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"clinicalInterpretation")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"benign"),(0,r.kt)("li",{parentName:"ul"},"curated benign"),(0,r.kt)("li",{parentName:"ul"},"curated pathogenic"),(0,r.kt)("li",{parentName:"ul"},"likely benign"),(0,r.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,r.kt)("li",{parentName:"ul"},"path gain"),(0,r.kt)("li",{parentName:"ul"},"path loss"),(0,r.kt)("li",{parentName:"ul"},"pathogenic"),(0,r.kt)("li",{parentName:"ul"},"uncertain")))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/86fcde84.0069d21b.js b/assets/js/86fcde84.0069d21b.js new file mode 100644 index 00000000..6f0d0ec0 --- /dev/null +++ b/assets/js/86fcde84.0069d21b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5955],{3905:(e,n,t)=>{t.d(n,{Zo:()=>u,kt:()=>f});var r=t(7294);function o(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function a(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);n&&(r=r.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,r)}return t}function c(e){for(var n=1;n=0||(o[t]=e[t]);return o}(e,n);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(o[t]=e[t])}return o}var s=r.createContext({}),l=function(e){var n=r.useContext(s),t=n;return e&&(t="function"==typeof e?e(n):c(c({},n),e)),t},u=function(e){var n=l(e.components);return r.createElement(s.Provider,{value:n},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var n=e.children;return r.createElement(r.Fragment,{},n)}},d=r.forwardRef((function(e,n){var t=e.components,o=e.mdxType,a=e.originalType,s=e.parentName,u=i(e,["components","mdxType","originalType","parentName"]),p=l(t),d=o,f=p["".concat(s,".").concat(d)]||p[d]||m[d]||a;return t?r.createElement(f,c(c({ref:n},u),{},{components:t})):r.createElement(f,c({ref:n},u))}));function f(e,n){var t=arguments,o=n&&n.mdxType;if("string"==typeof e||o){var a=t.length,c=new Array(a);c[0]=d;var i={};for(var s in n)hasOwnProperty.call(n,s)&&(i[s]=n[s]);i.originalType=e,i[p]="string"==typeof e?e:o,c[1]=i;for(var l=2;l{t.r(n),t.d(n,{contentTitle:()=>c,default:()=>p,frontMatter:()=>a,metadata:()=>i,toc:()=>s});var r=t(7462),o=(t(7294),t(3905));const a={},c=void 0,i={unversionedId:"data-sources/gnomad4.0-lof-json",id:"version-3.24/data-sources/gnomad4.0-lof-json",title:"gnomad4.0-lof-json",description:"",source:"@site/versioned_docs/version-3.24/data-sources/gnomad4.0-lof-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad4.0-lof-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-lof-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad4.0-lof-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],l={toc:s},u="wrapper";function p(e){let{components:n,...t}=e;return(0,o.kt)(u,(0,r.Z)({},l,t,{components:n,mdxType:"MDXLayout"}),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-json"},'"gnomAD": {\n "pLi": 0.00000122,\n "pRec": 0.32,\n "pNull": 0.68,\n "synZ": 0.0117,\n "misZ": 0.162,\n "loeuf": 1.94,\n "transcriptId": "ENST00000360525"\n}\n')))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/880ef044.14db7f9c.js b/assets/js/880ef044.14db7f9c.js new file mode 100644 index 00000000..89649ce4 --- /dev/null +++ b/assets/js/880ef044.14db7f9c.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[2988],{3905:(t,e,n)=>{n.d(e,{Zo:()=>d,kt:()=>k});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function i(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),m=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},d=function(t){var e=m(t.components);return a.createElement(p.Provider,{value:e},t.children)},u="mdxType",s={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},c=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,d=o(t,["components","mdxType","originalType","parentName"]),u=m(n),c=r,k=u["".concat(p,".").concat(c)]||u[c]||s[c]||l;return n?a.createElement(k,i(i({ref:e},d),{},{components:n})):a.createElement(k,i({ref:e},d))}));function k(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,i=new Array(l);i[0]=c;var o={};for(var p in e)hasOwnProperty.call(e,p)&&(o[p]=e[p]);o.originalType=t,o[u]="string"==typeof t?t:r,i[1]=o;for(var m=2;m{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={title:"ISCN Notation"},i=void 0,o={unversionedId:"core-functionality/iscn-notation",id:"version-3.24/core-functionality/iscn-notation",title:"ISCN Notation",description:"Introduction",source:"@site/versioned_docs/version-3.24/core-functionality/iscn-notation.md",sourceDirName:"core-functionality",slug:"/core-functionality/iscn-notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/iscn-notation",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/iscn-notation.md",tags:[],version:"3.24",frontMatter:{title:"ISCN Notation"},sidebar:"docs",previous:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/gene-fusions"},next:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/junction-preserving"}},p=[{value:"Introduction",id:"introduction",children:[{value:"Key Components of ISCN Notation:",id:"key-components-of-iscn-notation",children:[],level:3}],level:2},{value:"Overview",id:"overview",children:[{value:"Supported Variant Types",id:"supported-variant-types",children:[],level:3},{value:"Example",id:"example",children:[],level:3}],level:2},{value:"References",id:"references",children:[],level:2}],m={toc:p},d="wrapper";function u(t){let{components:e,...n}=t;return(0,r.kt)(d,(0,a.Z)({},m,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"introduction"},"Introduction"),(0,r.kt)("p",null,"The International System for Human Cytogenetic Nomenclature (ISCN) is a standardized system used\nto describe chromosomal abnormalities. It is a standardized system developed to describe the banding pattern of human\nchromosomes as well as any structural variations.\nISCN is used by geneticists and researchers to ensure clarity and uniformity when reporting chromosomal abnormalities."),(0,r.kt)("h3",{id:"key-components-of-iscn-notation"},"Key Components of ISCN Notation:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Chromosome Number"),": Identifies the chromosome."),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Arm"),': Chromosome arms are labeled "p" (short arm) and "q" (long arm).'),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Banding Pattern"),": Each arm is divided into regions, bands, and sub-bands that are numbered starting from the centromere (central part of the chromosome).")),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"The provided ISCN notation algorithm processes chromosomal variants and generates ISCN notation by following these steps:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Identify Variant Type"),":\nThe algorithm recognizes several types of chromosomal variants such as duplications, deletions, copy number gains, and copy number losses.")),(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Locate Cytogenetic Bands"),":\nUsing the start and end positions of the variant, the algorithm identifies the corresponding cytogenetic bands on the chromosome.")),(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Generate Notation"),":\nConstructs the ISCN notation string using the variant type, chromosome number, and identified cytogenetic bands."))),(0,r.kt)("h3",{id:"supported-variant-types"},"Supported Variant Types"),(0,r.kt)("p",null,"The algorithm supports the following variant types:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"Deletion (del)"),(0,r.kt)("li",{parentName:"ul"},"Duplication (dup)"),(0,r.kt)("li",{parentName:"ul"},"Copy Number Gain (dup)"),(0,r.kt)("li",{parentName:"ul"},"Copy Number Loss (del)")),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("p",null,"For a deletion on chromosome 8 from position 19200001 to 135400001, the algorithm would:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"Recognize the variant type as a deletion."),(0,r.kt)("li",{parentName:"ol"},"Identify the start band as ",(0,r.kt)("inlineCode",{parentName:"li"},"p21.3")," and the end band as ",(0,r.kt)("inlineCode",{parentName:"li"},"q24.23"),"."),(0,r.kt)("li",{parentName:"ol"},"Generate the ISCN notation: ",(0,r.kt)("inlineCode",{parentName:"li"},"del(8)(p21.3q24.23)"),".")),(0,r.kt)("p",null,"More examples:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Chromosome"),(0,r.kt)("th",{parentName:"tr",align:null},"Start Position"),(0,r.kt)("th",{parentName:"tr",align:null},"End Position"),(0,r.kt)("th",{parentName:"tr",align:null},"Variant Type"),(0,r.kt)("th",{parentName:"tr",align:null},"ISCN Notation"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"1"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(p21.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"1"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(p21.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(p21.3q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(p21.3q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"127300001"),(0,r.kt)("td",{parentName:"tr",align:null},"131500000"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.22)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"127300001"),(0,r.kt)("td",{parentName:"tr",align:null},"131500000"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number gain"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.22)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"128746677"),(0,r.kt)("td",{parentName:"tr",align:null},"128749160"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.21)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"128746677"),(0,r.kt)("td",{parentName:"tr",align:null},"128749160"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number gain"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.21)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"138900001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"146364022"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"145138635"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"138900001"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number loss"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"146364022"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"X"),(0,r.kt)("td",{parentName:"tr",align:null},"86200001"),(0,r.kt)("td",{parentName:"tr",align:null},"103700000"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number loss"),(0,r.kt)("td",{parentName:"tr",align:null},"del(X)(q21.31q22.2)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"X"),(0,r.kt)("td",{parentName:"tr",align:null},"86200001"),(0,r.kt)("td",{parentName:"tr",align:null},"103700000"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(X)(q21.31q22.2)")))),(0,r.kt)("h2",{id:"references"},"References"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("a",{parentName:"li",href:"https://karger.com/books/book/358/ISCN-2020An-International-System-for-Human"},"An International System for Human Cytogenomic Nomenclature (2020)")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("a",{parentName:"li",href:"https://hgvs-nomenclature.org/stable/recommendations/DNA/complex"},"HGVS website describing ISCN"))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/88990ce9.b85dd7dd.js b/assets/js/88990ce9.b85dd7dd.js new file mode 100644 index 00000000..0b547fbb --- /dev/null +++ b/assets/js/88990ce9.b85dd7dd.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1382,3831],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function i(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},c=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,i=e.originalType,s=e.parentName,c=l(e,["components","mdxType","originalType","parentName"]),p=d(n),u=r,v=p["".concat(s,".").concat(u)]||p[u]||m[u]||i;return n?a.createElement(v,o(o({ref:t},c),{},{components:n})):a.createElement(v,o({ref:t},c))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var i=n.length,o=new Array(i);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[p]="string"==typeof e?e:r,o[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>p,frontMatter:()=>i,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const i={},o=void 0,l={unversionedId:"data-sources/revel-json",id:"version-3.24/data-sources/revel-json",title:"revel-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/revel-json.md",sourceDirName:"data-sources",slug:"/data-sources/revel-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/revel-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},c="wrapper";function p(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"revel":{ \n "score":0.027\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"score"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1.0")))))}p.isMDXComponent=!0},4212:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>m,frontMatter:()=>o,metadata:()=>s,toc:()=>d});var a=n(7462),r=(n(7294),n(3905)),i=n(4723);const o={title:"REVEL"},l=void 0,s={unversionedId:"data-sources/revel",id:"version-3.24/data-sources/revel",title:"REVEL",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/revel.mdx",sourceDirName:"data-sources",slug:"/data-sources/revel",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/revel.mdx",tags:[],version:"3.24",frontMatter:{title:"REVEL"},sidebar:"docs",previous:{title:"Primate AI-3D",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai"},next:{title:"Splice AI",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"CSV File",id:"csv-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],c={toc:d},p="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. ",(0,r.kt)("em",{parentName:"p"},"The American Journal of Human Genetics")," ",(0,r.kt)("strong",{parentName:"p"},"99"),", 877-885 (2016). ",(0,r.kt)("a",{parentName:"p",href:"https://doi.org/10.1016/j.ajhg.2016.08.016"},"https://doi.org/10.1016/j.ajhg.2016.08.016")))),(0,r.kt)("h2",{id:"csv-file"},"CSV File"),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL\n1,35142,35142,G,A,T,M,0.027\n1,35142,35142,G,C,T,R,0.035\n1,35142,35142,G,T,T,K,0.043\n1,35143,35143,T,A,T,S,0.018\n1,35143,35143,T,C,T,A,0.034\n")),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"From the CSV file, we're mainly interested in the following columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"chr")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"hg19_pos")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"grch38_pos")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"ref")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"alt")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"REVEL"))),(0,r.kt)("h2",{id:"known-issues"},"Known Issues"),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Sorting")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Since the input file contains positions for both GRCh37 and GRCh38, we split it into two ",(0,r.kt)("strong",{parentName:"p"},"TSV")," files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file."))),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Conflicting Scores")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score."))),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://sites.google.com/site/revelgenomics/downloads"},"https://sites.google.com/site/revelgenomics/downloads")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(i.default,{mdxType:"JSON"}))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/8eb3126b.8d6691e6.js b/assets/js/8eb3126b.8d6691e6.js new file mode 100644 index 00000000..6bc7b24c --- /dev/null +++ b/assets/js/8eb3126b.8d6691e6.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4869],{3905:(t,e,n)=>{n.d(e,{Zo:()=>d,kt:()=>k});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function i(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),m=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},d=function(t){var e=m(t.components);return a.createElement(p.Provider,{value:e},t.children)},u="mdxType",s={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},c=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,d=o(t,["components","mdxType","originalType","parentName"]),u=m(n),c=r,k=u["".concat(p,".").concat(c)]||u[c]||s[c]||l;return n?a.createElement(k,i(i({ref:e},d),{},{components:n})):a.createElement(k,i({ref:e},d))}));function k(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,i=new Array(l);i[0]=c;var o={};for(var p in e)hasOwnProperty.call(e,p)&&(o[p]=e[p]);o.originalType=t,o[u]="string"==typeof t?t:r,i[1]=o;for(var m=2;m{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={title:"ISCN Notation"},i=void 0,o={unversionedId:"core-functionality/iscn-notation",id:"core-functionality/iscn-notation",title:"ISCN Notation",description:"Introduction",source:"@site/docs/core-functionality/iscn-notation.md",sourceDirName:"core-functionality",slug:"/core-functionality/iscn-notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/iscn-notation",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/iscn-notation.md",tags:[],version:"current",frontMatter:{title:"ISCN Notation"},sidebar:"docs",previous:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions"},next:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving"}},p=[{value:"Introduction",id:"introduction",children:[{value:"Key Components of ISCN Notation:",id:"key-components-of-iscn-notation",children:[],level:3}],level:2},{value:"Overview",id:"overview",children:[{value:"Supported Variant Types",id:"supported-variant-types",children:[],level:3},{value:"Example",id:"example",children:[],level:3}],level:2},{value:"References",id:"references",children:[],level:2}],m={toc:p},d="wrapper";function u(t){let{components:e,...n}=t;return(0,r.kt)(d,(0,a.Z)({},m,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"introduction"},"Introduction"),(0,r.kt)("p",null,"The International System for Human Cytogenetic Nomenclature (ISCN) is a standardized system used\nto describe chromosomal abnormalities. It is a standardized system developed to describe the banding pattern of human\nchromosomes as well as any structural variations.\nISCN is used by geneticists and researchers to ensure clarity and uniformity when reporting chromosomal abnormalities."),(0,r.kt)("h3",{id:"key-components-of-iscn-notation"},"Key Components of ISCN Notation:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Chromosome Number"),": Identifies the chromosome."),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Arm"),': Chromosome arms are labeled "p" (short arm) and "q" (long arm).'),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("strong",{parentName:"li"},"Banding Pattern"),": Each arm is divided into regions, bands, and sub-bands that are numbered starting from the centromere (central part of the chromosome).")),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"The provided ISCN notation algorithm processes chromosomal variants and generates ISCN notation by following these steps:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Identify Variant Type"),":\nThe algorithm recognizes several types of chromosomal variants such as duplications, deletions, copy number gains, and copy number losses.")),(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Locate Cytogenetic Bands"),":\nUsing the start and end positions of the variant, the algorithm identifies the corresponding cytogenetic bands on the chromosome.")),(0,r.kt)("li",{parentName:"ol"},(0,r.kt)("p",{parentName:"li"},(0,r.kt)("strong",{parentName:"p"},"Generate Notation"),":\nConstructs the ISCN notation string using the variant type, chromosome number, and identified cytogenetic bands."))),(0,r.kt)("h3",{id:"supported-variant-types"},"Supported Variant Types"),(0,r.kt)("p",null,"The algorithm supports the following variant types:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"Deletion (del)"),(0,r.kt)("li",{parentName:"ul"},"Duplication (dup)"),(0,r.kt)("li",{parentName:"ul"},"Copy Number Gain (dup)"),(0,r.kt)("li",{parentName:"ul"},"Copy Number Loss (del)")),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("p",null,"For a deletion on chromosome 8 from position 19200001 to 135400001, the algorithm would:"),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"Recognize the variant type as a deletion."),(0,r.kt)("li",{parentName:"ol"},"Identify the start band as ",(0,r.kt)("inlineCode",{parentName:"li"},"p21.3")," and the end band as ",(0,r.kt)("inlineCode",{parentName:"li"},"q24.23"),"."),(0,r.kt)("li",{parentName:"ol"},"Generate the ISCN notation: ",(0,r.kt)("inlineCode",{parentName:"li"},"del(8)(p21.3q24.23)"),".")),(0,r.kt)("p",null,"More examples:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Chromosome"),(0,r.kt)("th",{parentName:"tr",align:null},"Start Position"),(0,r.kt)("th",{parentName:"tr",align:null},"End Position"),(0,r.kt)("th",{parentName:"tr",align:null},"Variant Type"),(0,r.kt)("th",{parentName:"tr",align:null},"ISCN Notation"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"1"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(p21.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"1"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(p21.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(p21.3q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"19200001"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(p21.3q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"127300001"),(0,r.kt)("td",{parentName:"tr",align:null},"131500000"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.22)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"127300001"),(0,r.kt)("td",{parentName:"tr",align:null},"131500000"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number gain"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.22)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"128746677"),(0,r.kt)("td",{parentName:"tr",align:null},"128749160"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.21)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"128746677"),(0,r.kt)("td",{parentName:"tr",align:null},"128749160"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number gain"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.21q24.21)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"138900001"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"146364022"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"145138635"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"138900001"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number loss"),(0,r.kt)("td",{parentName:"tr",align:null},"del(8)(q24.23q24.3)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"8"),(0,r.kt)("td",{parentName:"tr",align:null},"135400001"),(0,r.kt)("td",{parentName:"tr",align:null},"146364022"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication"),(0,r.kt)("td",{parentName:"tr",align:null},"dup(8)(q24.23)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"X"),(0,r.kt)("td",{parentName:"tr",align:null},"86200001"),(0,r.kt)("td",{parentName:"tr",align:null},"103700000"),(0,r.kt)("td",{parentName:"tr",align:null},"copy number loss"),(0,r.kt)("td",{parentName:"tr",align:null},"del(X)(q21.31q22.2)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"X"),(0,r.kt)("td",{parentName:"tr",align:null},"86200001"),(0,r.kt)("td",{parentName:"tr",align:null},"103700000"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"del(X)(q21.31q22.2)")))),(0,r.kt)("h2",{id:"references"},"References"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("a",{parentName:"li",href:"https://karger.com/books/book/358/ISCN-2020An-International-System-for-Human"},"An International System for Human Cytogenomic Nomenclature (2020)")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("a",{parentName:"li",href:"https://hgvs-nomenclature.org/stable/recommendations/DNA/complex"},"HGVS website describing ISCN"))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/91971be7.24ae75ad.js b/assets/js/91971be7.24ae75ad.js new file mode 100644 index 00000000..9b689071 --- /dev/null +++ b/assets/js/91971be7.24ae75ad.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[131],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function i(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var s=r.createContext({}),l=function(e){var t=r.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},p=function(e){var t=l(e.components);return r.createElement(s.Provider,{value:t},e.children)},u="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},m=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,s=e.parentName,p=c(e,["components","mdxType","originalType","parentName"]),u=l(n),m=a,f=u["".concat(s,".").concat(m)]||u[m]||d[m]||o;return n?r.createElement(f,i(i({ref:t},p),{},{components:n})):r.createElement(f,i({ref:t},p))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,i=new Array(o);i[0]=m;var c={};for(var s in t)hasOwnProperty.call(t,s)&&(c[s]=t[s]);c.originalType=e,c[u]="string"==typeof e?e:a,i[1]=c;for(var l=2;l{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>u,frontMatter:()=>o,metadata:()=>c,toc:()=>s});var r=n(7462),a=(n(7294),n(3905));const o={},i=void 0,c={unversionedId:"data-sources/amino-acid-conservation-json",id:"version-3.24/data-sources/amino-acid-conservation-json",title:"amino-acid-conservation-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",sourceDirName:"data-sources",slug:"/data-sources/amino-acid-conservation-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/amino-acid-conservation-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],l={toc:s},p="wrapper";function u(e){let{components:t,...n}=e;return(0,a.kt)(p,(0,r.Z)({},l,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"aminoAcidConservation": {\n "scores": [0.34]\n} \n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"aminoAcidConservation"),(0,a.kt)("td",{parentName:"tr",align:"center"},"object"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"scores"),(0,a.kt)("td",{parentName:"tr",align:"center"},"object array of doubles"),(0,a.kt)("td",{parentName:"tr",align:"left"},"percent conserved with respect to human amino acid residue. Range: 0.01 - 1.00")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/935f2afb.e82442de.js b/assets/js/935f2afb.e82442de.js new file mode 100644 index 00000000..697ba122 --- /dev/null +++ b/assets/js/935f2afb.e82442de.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[53],{1109:e=>{e.exports=JSON.parse('{"pluginId":"default","version":"current","label":"3.25 (unreleased)","banner":null,"badge":true,"className":"docs-version-current","isLast":true,"docsSidebars":{"docs":[{"type":"category","label":"Introduction","items":[{"type":"link","label":"Introduction","href":"/IlluminaConnectedAnnotationsDocumentation/","docId":"introduction/introduction"},{"type":"link","label":"Licensed Content","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent","docId":"introduction/licensedContent"},{"type":"link","label":"Dependencies","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies","docId":"introduction/dependencies"},{"type":"link","label":"Getting Started","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started","docId":"introduction/getting-started"},{"type":"link","label":"Parsing Illumina Connected Annotations JSON","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json","docId":"introduction/parsing-json"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Data Sources","items":[{"type":"link","label":"1000 Genomes","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes","docId":"data-sources/1000Genomes"},{"type":"link","label":"Amino Acid Conservation","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation","docId":"data-sources/amino-acid-conservation"},{"type":"link","label":"Cancer Hotspots","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots","docId":"data-sources/cancer-hotspots"},{"type":"link","label":"ClinGen","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen","docId":"data-sources/clingen"},{"type":"link","label":"ClinVar","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar","docId":"data-sources/clinvar"},{"type":"link","label":"ClinVar Preview","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview","docId":"data-sources/clinvar-preview"},{"type":"link","label":"COSMIC","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic","docId":"data-sources/cosmic"},{"type":"link","label":"DANN","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann","docId":"data-sources/dann"},{"type":"link","label":"dbSNP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp","docId":"data-sources/dbsnp"},{"type":"link","label":"DECIPHER","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher","docId":"data-sources/decipher"},{"type":"link","label":"FusionCatcher","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher","docId":"data-sources/fusioncatcher"},{"type":"link","label":"GERP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp","docId":"data-sources/gerp"},{"type":"link","label":"GME Variome","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme","docId":"data-sources/gme"},{"type":"link","label":"gnomAD","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad","docId":"data-sources/gnomad"},{"type":"link","label":"Mitochondrial Heteroplasmy","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy","docId":"data-sources/mito-heteroplasmy"},{"type":"link","label":"MITOMAP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap","docId":"data-sources/mitomap"},{"type":"link","label":"OMIM","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim","docId":"data-sources/omim"},{"type":"link","label":"PhyloP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop","docId":"data-sources/phylop"},{"type":"link","label":"Primate AI-3D","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai","docId":"data-sources/primate-ai"},{"type":"link","label":"REVEL","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel","docId":"data-sources/revel"},{"type":"link","label":"Splice AI","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai","docId":"data-sources/splice-ai"},{"type":"link","label":"TOPMed","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed","docId":"data-sources/topmed"}],"collapsible":true,"collapsed":true},{"type":"category","label":"File Formats","items":[{"type":"link","label":"Illumina Connected Annotations JSON File Format","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format","docId":"file-formats/illumina-annotator-json-file-format"},{"type":"link","label":"Illumina Connected Annotations VCF File Format","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format","docId":"file-formats/illumina-annotator-vcf-file-format"},{"type":"link","label":"Custom Annotations","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations","docId":"file-formats/custom-annotations"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Core Functionality","items":[{"type":"link","label":"Canonical Transcripts","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts","docId":"core-functionality/canonical-transcripts"},{"type":"link","label":"Gene Fusion Detection","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions","docId":"core-functionality/gene-fusions"},{"type":"link","label":"ISCN Notation","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/iscn-notation","docId":"core-functionality/iscn-notation"},{"type":"link","label":"Junction Preserving Annotation","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving","docId":"core-functionality/junction-preserving"},{"type":"link","label":"Transcript Consequence Impact","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts","docId":"core-functionality/transcript-consequence-impacts"},{"type":"link","label":"Variant IDs","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids","docId":"core-functionality/variant-ids"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Utilities","items":[{"type":"link","label":"Jasix","href":"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix","docId":"utilities/jasix"},{"type":"link","label":"SAUtils","href":"/IlluminaConnectedAnnotationsDocumentation/utilities/sautils","docId":"utilities/sautils"}],"collapsible":true,"collapsed":true},{"type":"category","label":"FAQs","items":[{"type":"link","label":"Annotation Engine vs Data update","href":"/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update","docId":"frequently-asked-questions/Annotator-vs-data-update"}],"collapsible":true,"collapsed":true}]},"docs":{"core-functionality/canonical-transcripts":{"id":"core-functionality/canonical-transcripts","title":"Canonical Transcripts","description":"Overview","sidebar":"docs"},"core-functionality/gene-fusions":{"id":"core-functionality/gene-fusions","title":"Gene Fusion Detection","description":"Overview","sidebar":"docs"},"core-functionality/iscn-notation":{"id":"core-functionality/iscn-notation","title":"ISCN Notation","description":"Introduction","sidebar":"docs"},"core-functionality/junction-preserving":{"id":"core-functionality/junction-preserving","title":"Junction Preserving Annotation","description":"Background","sidebar":"docs"},"core-functionality/transcript-consequence-impacts":{"id":"core-functionality/transcript-consequence-impacts","title":"Transcript Consequence Impact","description":"Overview","sidebar":"docs"},"core-functionality/variant-ids":{"id":"core-functionality/variant-ids","title":"Variant IDs","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes":{"id":"data-sources/1000Genomes","title":"1000 Genomes","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes-snv-json":{"id":"data-sources/1000Genomes-snv-json","title":"1000Genomes-snv-json","description":"| Field | Type | Notes |"},"data-sources/1000Genomes-sv-json":{"id":"data-sources/1000Genomes-sv-json","title":"1000Genomes-sv-json","description":"| Field | Type | Notes |"},"data-sources/amino-acid-conservation":{"id":"data-sources/amino-acid-conservation","title":"Amino Acid Conservation","description":"Overview","sidebar":"docs"},"data-sources/amino-acid-conservation-json":{"id":"data-sources/amino-acid-conservation-json","title":"amino-acid-conservation-json","description":"| Field | Type | Notes |"},"data-sources/cancer-hotspots":{"id":"data-sources/cancer-hotspots","title":"Cancer Hotspots","description":"Overview","sidebar":"docs"},"data-sources/clingen":{"id":"data-sources/clingen","title":"ClinGen","description":"Overview","sidebar":"docs"},"data-sources/clingen-dosage-json":{"id":"data-sources/clingen-dosage-json","title":"clingen-dosage-json","description":"| Field | Type | Notes |"},"data-sources/clingen-gene-validity-json":{"id":"data-sources/clingen-gene-validity-json","title":"clingen-gene-validity-json","description":"| Field | Type | Notes |"},"data-sources/clingen-json":{"id":"data-sources/clingen-json","title":"clingen-json","description":"| Field | Type | Notes |"},"data-sources/clinvar":{"id":"data-sources/clinvar","title":"ClinVar","description":"Overview","sidebar":"docs"},"data-sources/clinvar-json":{"id":"data-sources/clinvar-json","title":"clinvar-json","description":"small variants:"},"data-sources/clinvar-preview":{"id":"data-sources/clinvar-preview","title":"ClinVar Preview","description":"Overview","sidebar":"docs"},"data-sources/clinvar-preview-json":{"id":"data-sources/clinvar-preview-json","title":"clinvar-preview-json","description":"small variants:"},"data-sources/cosmic":{"id":"data-sources/cosmic","title":"COSMIC","description":"Overview","sidebar":"docs"},"data-sources/cosmic-cancer-gene-census":{"id":"data-sources/cosmic-cancer-gene-census","title":"cosmic-cancer-gene-census","description":"| Field | Type | Notes |"},"data-sources/cosmic-gene-fusion-json":{"id":"data-sources/cosmic-gene-fusion-json","title":"cosmic-gene-fusion-json","description":"| Field | Type | Notes |"},"data-sources/cosmic-json":{"id":"data-sources/cosmic-json","title":"cosmic-json","description":"| Field | Type | Notes |"},"data-sources/dann":{"id":"data-sources/dann","title":"DANN","description":"Overview","sidebar":"docs"},"data-sources/dann-json":{"id":"data-sources/dann-json","title":"dann-json","description":"| Field | Type | Notes |"},"data-sources/dbsnp":{"id":"data-sources/dbsnp","title":"dbSNP","description":"Overview","sidebar":"docs"},"data-sources/dbsnp-json":{"id":"data-sources/dbsnp-json","title":"dbsnp-json","description":"| Field | Type | Notes |"},"data-sources/decipher":{"id":"data-sources/decipher","title":"DECIPHER","description":"Overview","sidebar":"docs"},"data-sources/decipher-json":{"id":"data-sources/decipher-json","title":"decipher-json","description":"| Field | Type | Notes |"},"data-sources/fusioncatcher":{"id":"data-sources/fusioncatcher","title":"FusionCatcher","description":"Overview","sidebar":"docs"},"data-sources/fusioncatcher-json":{"id":"data-sources/fusioncatcher-json","title":"fusioncatcher-json","description":"| Field | Type | Notes |"},"data-sources/gerp":{"id":"data-sources/gerp","title":"GERP","description":"Overview","sidebar":"docs"},"data-sources/gerp-json":{"id":"data-sources/gerp-json","title":"gerp-json","description":"| Field | Type | Notes |"},"data-sources/gme":{"id":"data-sources/gme","title":"GME Variome","description":"Overview","sidebar":"docs"},"data-sources/gme-json":{"id":"data-sources/gme-json","title":"gme-json","description":"| Field | Type | Notes |"},"data-sources/gnomad":{"id":"data-sources/gnomad","title":"gnomAD","description":"Overview","sidebar":"docs"},"data-sources/gnomad-lof-json":{"id":"data-sources/gnomad-lof-json","title":"gnomad-lof-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-small-variants-json":{"id":"data-sources/gnomad-small-variants-json","title":"gnomad-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-structural-variants-data_description":{"id":"data-sources/gnomad-structural-variants-data_description","title":"gnomad-structural-variants-data_description","description":"Bed Example"},"data-sources/gnomad-structural-variants-json":{"id":"data-sources/gnomad-structural-variants-json","title":"gnomad-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad4.0-lof-json":{"id":"data-sources/gnomad4.0-lof-json","title":"gnomad4.0-lof-json","description":""},"data-sources/gnomad4.0-small-variants-json":{"id":"data-sources/gnomad4.0-small-variants-json","title":"gnomad4.0-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad40-structural-variants-json":{"id":"data-sources/gnomad40-structural-variants-json","title":"gnomad40-structural-variants-json","description":""},"data-sources/mito-heteroplasmy":{"id":"data-sources/mito-heteroplasmy","title":"Mitochondrial Heteroplasmy","description":"Overview","sidebar":"docs"},"data-sources/mitomap":{"id":"data-sources/mitomap","title":"MITOMAP","description":"Overview","sidebar":"docs"},"data-sources/mitomap-small-variants-json":{"id":"data-sources/mitomap-small-variants-json","title":"mitomap-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/mitomap-structural-variants-json":{"id":"data-sources/mitomap-structural-variants-json","title":"mitomap-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/omim":{"id":"data-sources/omim","title":"OMIM","description":"Overview","sidebar":"docs"},"data-sources/omim-json":{"id":"data-sources/omim-json","title":"omim-json","description":"| Field | Type | Notes |"},"data-sources/phylop":{"id":"data-sources/phylop","title":"PhyloP","description":"Overview","sidebar":"docs"},"data-sources/phylop-json":{"id":"data-sources/phylop-json","title":"phylop-json","description":"| Field | Type | Notes |"},"data-sources/phylopprimate-json":{"id":"data-sources/phylopprimate-json","title":"phylopprimate-json","description":"| Field | Type | Notes |"},"data-sources/primate-ai":{"id":"data-sources/primate-ai","title":"Primate AI-3D","description":"Overview","sidebar":"docs"},"data-sources/primate-ai-json":{"id":"data-sources/primate-ai-json","title":"primate-ai-json","description":"| Field | Type | Notes |"},"data-sources/revel":{"id":"data-sources/revel","title":"REVEL","description":"Overview","sidebar":"docs"},"data-sources/revel-json":{"id":"data-sources/revel-json","title":"revel-json","description":"| Field | Type | Notes |"},"data-sources/splice-ai":{"id":"data-sources/splice-ai","title":"Splice AI","description":"Overview","sidebar":"docs"},"data-sources/splice-ai-json":{"id":"data-sources/splice-ai-json","title":"splice-ai-json","description":"| Field | Type | Notes |"},"data-sources/topmed":{"id":"data-sources/topmed","title":"TOPMed","description":"Overview","sidebar":"docs"},"data-sources/topmed-json":{"id":"data-sources/topmed-json","title":"topmed-json","description":"| Field | Type | Notes |"},"file-formats/custom-annotations":{"id":"file-formats/custom-annotations","title":"Custom Annotations","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-json-file-format":{"id":"file-formats/illumina-annotator-json-file-format","title":"Illumina Connected Annotations JSON File Format","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-vcf-file-format":{"id":"file-formats/illumina-annotator-vcf-file-format","title":"Illumina Connected Annotations VCF File Format","description":"Overview","sidebar":"docs"},"frequently-asked-questions/Annotator-vs-data-update":{"id":"frequently-asked-questions/Annotator-vs-data-update","title":"Annotation Engine vs Data update","description":"Background","sidebar":"docs"},"introduction/dependencies":{"id":"introduction/dependencies","title":"Dependencies","description":"All of the following dependencies have been included in this repository.","sidebar":"docs"},"introduction/getting-started":{"id":"introduction/getting-started","title":"Getting Started","description":"Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.","sidebar":"docs"},"introduction/introduction":{"id":"introduction/introduction","title":"Introduction","description":"translational research-grade variant annotation","sidebar":"docs"},"introduction/licensedContent":{"id":"introduction/licensedContent","title":"Licensed Content","description":"Illumina Conncted Annotations supports following content which is available through a license from Illumina.","sidebar":"docs"},"introduction/parsing-json":{"id":"introduction/parsing-json","title":"Parsing Illumina Connected Annotations JSON","description":"Parsing JSON","sidebar":"docs"},"utilities/jasix":{"id":"utilities/jasix","title":"Jasix","description":"Overview","sidebar":"docs"},"utilities/sautils":{"id":"utilities/sautils","title":"SAUtils","description":"Overview","sidebar":"docs"}}}')}}]); \ No newline at end of file diff --git a/assets/js/935f2afb.e94232a1.js b/assets/js/935f2afb.e94232a1.js deleted file mode 100644 index 44ae9c2b..00000000 --- a/assets/js/935f2afb.e94232a1.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[53],{1109:e=>{e.exports=JSON.parse('{"pluginId":"default","version":"current","label":"3.24 (unreleased)","banner":null,"badge":true,"className":"docs-version-current","isLast":true,"docsSidebars":{"docs":[{"type":"category","label":"Introduction","items":[{"type":"link","label":"Introduction","href":"/IlluminaConnectedAnnotationsDocumentation/","docId":"introduction/introduction"},{"type":"link","label":"Licensed Content","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent","docId":"introduction/licensedContent"},{"type":"link","label":"Dependencies","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies","docId":"introduction/dependencies"},{"type":"link","label":"Getting Started","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started","docId":"introduction/getting-started"},{"type":"link","label":"Parsing Illumina Connected Annotations JSON","href":"/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json","docId":"introduction/parsing-json"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Data Sources","items":[{"type":"link","label":"1000 Genomes","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes","docId":"data-sources/1000Genomes"},{"type":"link","label":"Amino Acid Conservation","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation","docId":"data-sources/amino-acid-conservation"},{"type":"link","label":"Cancer Hotspots","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots","docId":"data-sources/cancer-hotspots"},{"type":"link","label":"ClinGen","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen","docId":"data-sources/clingen"},{"type":"link","label":"ClinVar","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar","docId":"data-sources/clinvar"},{"type":"link","label":"ClinVar Preview","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview","docId":"data-sources/clinvar-preview"},{"type":"link","label":"COSMIC","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic","docId":"data-sources/cosmic"},{"type":"link","label":"DANN","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann","docId":"data-sources/dann"},{"type":"link","label":"dbSNP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp","docId":"data-sources/dbsnp"},{"type":"link","label":"DECIPHER","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher","docId":"data-sources/decipher"},{"type":"link","label":"FusionCatcher","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher","docId":"data-sources/fusioncatcher"},{"type":"link","label":"GERP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp","docId":"data-sources/gerp"},{"type":"link","label":"GME Variome","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme","docId":"data-sources/gme"},{"type":"link","label":"gnomAD","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad","docId":"data-sources/gnomad"},{"type":"link","label":"Mitochondrial Heteroplasmy","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy","docId":"data-sources/mito-heteroplasmy"},{"type":"link","label":"MITOMAP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap","docId":"data-sources/mitomap"},{"type":"link","label":"OMIM","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim","docId":"data-sources/omim"},{"type":"link","label":"PhyloP","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop","docId":"data-sources/phylop"},{"type":"link","label":"Primate AI-3D","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai","docId":"data-sources/primate-ai"},{"type":"link","label":"REVEL","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel","docId":"data-sources/revel"},{"type":"link","label":"Splice AI","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai","docId":"data-sources/splice-ai"},{"type":"link","label":"TOPMed","href":"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed","docId":"data-sources/topmed"}],"collapsible":true,"collapsed":true},{"type":"category","label":"File Formats","items":[{"type":"link","label":"Illumina Connected Annotations JSON File Format","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format","docId":"file-formats/illumina-annotator-json-file-format"},{"type":"link","label":"Illumina Connected Annotations VCF File Format","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format","docId":"file-formats/illumina-annotator-vcf-file-format"},{"type":"link","label":"Custom Annotations","href":"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations","docId":"file-formats/custom-annotations"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Core Functionality","items":[{"type":"link","label":"Canonical Transcripts","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts","docId":"core-functionality/canonical-transcripts"},{"type":"link","label":"Junction Preserving Annotation","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving","docId":"core-functionality/junction-preserving"},{"type":"link","label":"Transcript Consequence Impact","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts","docId":"core-functionality/transcript-consequence-impacts"},{"type":"link","label":"Gene Fusion Detection","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions","docId":"core-functionality/gene-fusions"},{"type":"link","label":"Variant IDs","href":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids","docId":"core-functionality/variant-ids"}],"collapsible":true,"collapsed":true},{"type":"category","label":"Utilities","items":[{"type":"link","label":"Jasix","href":"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix","docId":"utilities/jasix"},{"type":"link","label":"SAUtils","href":"/IlluminaConnectedAnnotationsDocumentation/utilities/sautils","docId":"utilities/sautils"}],"collapsible":true,"collapsed":true},{"type":"category","label":"FAQs","items":[{"type":"link","label":"Annotation Engine vs Data update","href":"/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update","docId":"frequently-asked-questions/Annotator-vs-data-update"}],"collapsible":true,"collapsed":true}]},"docs":{"core-functionality/canonical-transcripts":{"id":"core-functionality/canonical-transcripts","title":"Canonical Transcripts","description":"Overview","sidebar":"docs"},"core-functionality/gene-fusions":{"id":"core-functionality/gene-fusions","title":"Gene Fusion Detection","description":"Overview","sidebar":"docs"},"core-functionality/junction-preserving":{"id":"core-functionality/junction-preserving","title":"Junction Preserving Annotation","description":"Background","sidebar":"docs"},"core-functionality/transcript-consequence-impacts":{"id":"core-functionality/transcript-consequence-impacts","title":"Transcript Consequence Impact","description":"Overview","sidebar":"docs"},"core-functionality/variant-ids":{"id":"core-functionality/variant-ids","title":"Variant IDs","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes":{"id":"data-sources/1000Genomes","title":"1000 Genomes","description":"Overview","sidebar":"docs"},"data-sources/1000Genomes-snv-json":{"id":"data-sources/1000Genomes-snv-json","title":"1000Genomes-snv-json","description":"| Field | Type | Notes |"},"data-sources/1000Genomes-sv-json":{"id":"data-sources/1000Genomes-sv-json","title":"1000Genomes-sv-json","description":"| Field | Type | Notes |"},"data-sources/amino-acid-conservation":{"id":"data-sources/amino-acid-conservation","title":"Amino Acid Conservation","description":"Overview","sidebar":"docs"},"data-sources/amino-acid-conservation-json":{"id":"data-sources/amino-acid-conservation-json","title":"amino-acid-conservation-json","description":"| Field | Type | Notes |"},"data-sources/cancer-hotspots":{"id":"data-sources/cancer-hotspots","title":"Cancer Hotspots","description":"Overview","sidebar":"docs"},"data-sources/clingen":{"id":"data-sources/clingen","title":"ClinGen","description":"Overview","sidebar":"docs"},"data-sources/clingen-dosage-json":{"id":"data-sources/clingen-dosage-json","title":"clingen-dosage-json","description":"| Field | Type | Notes |"},"data-sources/clingen-gene-validity-json":{"id":"data-sources/clingen-gene-validity-json","title":"clingen-gene-validity-json","description":"| Field | Type | Notes |"},"data-sources/clingen-json":{"id":"data-sources/clingen-json","title":"clingen-json","description":"| Field | Type | Notes |"},"data-sources/clinvar":{"id":"data-sources/clinvar","title":"ClinVar","description":"Overview","sidebar":"docs"},"data-sources/clinvar-json":{"id":"data-sources/clinvar-json","title":"clinvar-json","description":"small variants:"},"data-sources/clinvar-preview":{"id":"data-sources/clinvar-preview","title":"ClinVar Preview","description":"Overview","sidebar":"docs"},"data-sources/clinvar-preview-json":{"id":"data-sources/clinvar-preview-json","title":"clinvar-preview-json","description":"small variants:"},"data-sources/cosmic":{"id":"data-sources/cosmic","title":"COSMIC","description":"Overview","sidebar":"docs"},"data-sources/cosmic-cancer-gene-census":{"id":"data-sources/cosmic-cancer-gene-census","title":"cosmic-cancer-gene-census","description":"| Field | Type | Notes |"},"data-sources/cosmic-gene-fusion-json":{"id":"data-sources/cosmic-gene-fusion-json","title":"cosmic-gene-fusion-json","description":"| Field | Type | Notes |"},"data-sources/cosmic-json":{"id":"data-sources/cosmic-json","title":"cosmic-json","description":"| Field | Type | Notes |"},"data-sources/dann":{"id":"data-sources/dann","title":"DANN","description":"Overview","sidebar":"docs"},"data-sources/dann-json":{"id":"data-sources/dann-json","title":"dann-json","description":"| Field | Type | Notes |"},"data-sources/dbsnp":{"id":"data-sources/dbsnp","title":"dbSNP","description":"Overview","sidebar":"docs"},"data-sources/dbsnp-json":{"id":"data-sources/dbsnp-json","title":"dbsnp-json","description":"| Field | Type | Notes |"},"data-sources/decipher":{"id":"data-sources/decipher","title":"DECIPHER","description":"Overview","sidebar":"docs"},"data-sources/decipher-json":{"id":"data-sources/decipher-json","title":"decipher-json","description":"| Field | Type | Notes |"},"data-sources/fusioncatcher":{"id":"data-sources/fusioncatcher","title":"FusionCatcher","description":"Overview","sidebar":"docs"},"data-sources/fusioncatcher-json":{"id":"data-sources/fusioncatcher-json","title":"fusioncatcher-json","description":"| Field | Type | Notes |"},"data-sources/gerp":{"id":"data-sources/gerp","title":"GERP","description":"Overview","sidebar":"docs"},"data-sources/gerp-json":{"id":"data-sources/gerp-json","title":"gerp-json","description":"| Field | Type | Notes |"},"data-sources/gme":{"id":"data-sources/gme","title":"GME Variome","description":"Overview","sidebar":"docs"},"data-sources/gme-json":{"id":"data-sources/gme-json","title":"gme-json","description":"| Field | Type | Notes |"},"data-sources/gnomad":{"id":"data-sources/gnomad","title":"gnomAD","description":"Overview","sidebar":"docs"},"data-sources/gnomad-lof-json":{"id":"data-sources/gnomad-lof-json","title":"gnomad-lof-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-small-variants-json":{"id":"data-sources/gnomad-small-variants-json","title":"gnomad-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad-structural-variants-data_description":{"id":"data-sources/gnomad-structural-variants-data_description","title":"gnomad-structural-variants-data_description","description":"Bed Example"},"data-sources/gnomad-structural-variants-json":{"id":"data-sources/gnomad-structural-variants-json","title":"gnomad-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad4.0-lof-json":{"id":"data-sources/gnomad4.0-lof-json","title":"gnomad4.0-lof-json","description":""},"data-sources/gnomad4.0-small-variants-json":{"id":"data-sources/gnomad4.0-small-variants-json","title":"gnomad4.0-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/gnomad40-structural-variants-json":{"id":"data-sources/gnomad40-structural-variants-json","title":"gnomad40-structural-variants-json","description":""},"data-sources/mito-heteroplasmy":{"id":"data-sources/mito-heteroplasmy","title":"Mitochondrial Heteroplasmy","description":"Overview","sidebar":"docs"},"data-sources/mitomap":{"id":"data-sources/mitomap","title":"MITOMAP","description":"Overview","sidebar":"docs"},"data-sources/mitomap-small-variants-json":{"id":"data-sources/mitomap-small-variants-json","title":"mitomap-small-variants-json","description":"| Field | Type | Notes |"},"data-sources/mitomap-structural-variants-json":{"id":"data-sources/mitomap-structural-variants-json","title":"mitomap-structural-variants-json","description":"| Field | Type | Notes |"},"data-sources/omim":{"id":"data-sources/omim","title":"OMIM","description":"Overview","sidebar":"docs"},"data-sources/omim-json":{"id":"data-sources/omim-json","title":"omim-json","description":"| Field | Type | Notes |"},"data-sources/phylop":{"id":"data-sources/phylop","title":"PhyloP","description":"Overview","sidebar":"docs"},"data-sources/phylop-json":{"id":"data-sources/phylop-json","title":"phylop-json","description":"| Field | Type | Notes |"},"data-sources/phylopprimate-json":{"id":"data-sources/phylopprimate-json","title":"phylopprimate-json","description":"| Field | Type | Notes |"},"data-sources/primate-ai":{"id":"data-sources/primate-ai","title":"Primate AI-3D","description":"Overview","sidebar":"docs"},"data-sources/primate-ai-json":{"id":"data-sources/primate-ai-json","title":"primate-ai-json","description":"| Field | Type | Notes |"},"data-sources/revel":{"id":"data-sources/revel","title":"REVEL","description":"Overview","sidebar":"docs"},"data-sources/revel-json":{"id":"data-sources/revel-json","title":"revel-json","description":"| Field | Type | Notes |"},"data-sources/splice-ai":{"id":"data-sources/splice-ai","title":"Splice AI","description":"Overview","sidebar":"docs"},"data-sources/splice-ai-json":{"id":"data-sources/splice-ai-json","title":"splice-ai-json","description":"| Field | Type | Notes |"},"data-sources/topmed":{"id":"data-sources/topmed","title":"TOPMed","description":"Overview","sidebar":"docs"},"data-sources/topmed-json":{"id":"data-sources/topmed-json","title":"topmed-json","description":"| Field | Type | Notes |"},"file-formats/custom-annotations":{"id":"file-formats/custom-annotations","title":"Custom Annotations","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-json-file-format":{"id":"file-formats/illumina-annotator-json-file-format","title":"Illumina Connected Annotations JSON File Format","description":"Overview","sidebar":"docs"},"file-formats/illumina-annotator-vcf-file-format":{"id":"file-formats/illumina-annotator-vcf-file-format","title":"Illumina Connected Annotations VCF File Format","description":"Overview","sidebar":"docs"},"frequently-asked-questions/Annotator-vs-data-update":{"id":"frequently-asked-questions/Annotator-vs-data-update","title":"Annotation Engine vs Data update","description":"Background","sidebar":"docs"},"introduction/dependencies":{"id":"introduction/dependencies","title":"Dependencies","description":"All of the following dependencies have been included in this repository.","sidebar":"docs"},"introduction/getting-started":{"id":"introduction/getting-started","title":"Getting Started","description":"Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.","sidebar":"docs"},"introduction/introduction":{"id":"introduction/introduction","title":"Introduction","description":"Clinical-grade variant annotation","sidebar":"docs"},"introduction/licensedContent":{"id":"introduction/licensedContent","title":"Licensed Content","description":"Illumina Conncted Annotations supports following content which is available through a license from Illumina.","sidebar":"docs"},"introduction/parsing-json":{"id":"introduction/parsing-json","title":"Parsing Illumina Connected Annotations JSON","description":"Parsing JSON","sidebar":"docs"},"utilities/jasix":{"id":"utilities/jasix","title":"Jasix","description":"Overview","sidebar":"docs"},"utilities/sautils":{"id":"utilities/sautils","title":"SAUtils","description":"Overview","sidebar":"docs"}}}')}}]); \ No newline at end of file diff --git a/assets/js/94d6913f.a774a0ab.js b/assets/js/94d6913f.a774a0ab.js new file mode 100644 index 00000000..94dd6d99 --- /dev/null +++ b/assets/js/94d6913f.a774a0ab.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5882,9636,7946,4241],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>g});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function r(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):r(r({},t),e)),n},p=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},c="mdxType",u={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,l=e.originalType,s=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),c=d(n),m=i,g=c["".concat(s,".").concat(m)]||c[m]||u[m]||l;return n?a.createElement(g,r(r({ref:t},p),{},{components:n})):a.createElement(g,r({ref:t},p))}));function g(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var l=n.length,r=new Array(l);r[0]=m;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[c]="string"==typeof e?e:i,r[1]=o;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>c,frontMatter:()=>l,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const l={},r=void 0,o={unversionedId:"data-sources/clingen-dosage-json",id:"version-3.24/data-sources/clingen-dosage-json",title:"clingen-dosage-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-dosage-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-dosage-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-dosage-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},p="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clingenDosageSensitivityMap": [{\n "chromosome": "15",\n "begin": 30900686,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 0.33994\n},\n{\n "chromosome": "15",\n "begin": 31727418,\n "end": 32153204,\n "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",\n "triplosensitivity": "dosage sensitivity unlikely",\n "reciprocalOverlap": 0.00147,\n "annotationOverlap": 1\n}]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"Field"),(0,i.kt)("th",{parentName:"tr",align:null},"Type"),(0,i.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"clingenDosageSensitivityMap"),(0,i.kt)("td",{parentName:"tr",align:null},"object array"),(0,i.kt)("td",{parentName:"tr",align:null})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"begin"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"end"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"haploinsufficiency"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"triplosensitivity"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"(same as haploinsufficiency)\xa0")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,i.kt)("td",{parentName:"tr",align:null},"floating point"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,i.kt)("td",{parentName:"tr",align:null},"floating point"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"haploinsufficiency and triplosensitivity")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"no evidence to suggest that dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"little evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"emerging evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype"),(0,i.kt)("li",{parentName:"ul"},"gene associated with autosomal recessive phenotype"),(0,i.kt)("li",{parentName:"ul"},"dosage sensitivity unlikely")))}c.isMDXComponent=!0},6361:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>c,frontMatter:()=>l,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const l={},r=void 0,o={unversionedId:"data-sources/clingen-gene-validity-json",id:"version-3.24/data-sources/clingen-gene-validity-json",title:"clingen-gene-validity-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-gene-validity-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-gene-validity-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-gene-validity-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},p="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clingenGeneValidity":[\n {\n "diseaseId":"MONDO_0007893",\n "disease":"Noonan syndrome with multiple lentigines",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n },\n {\n "diseaseId":"MONDO_0015280",\n "disease":"cardiofaciocutaneous syndrome",\n "classification":"no reported evidence",\n "classificationDate":"2018-06-07"\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"Field"),(0,i.kt)("th",{parentName:"tr",align:null},"Type"),(0,i.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"clingenGeneValidity"),(0,i.kt)("td",{parentName:"tr",align:null},"object"),(0,i.kt)("td",{parentName:"tr",align:null})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"diseaseId"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Monarch Disease Ontology ID (MONDO)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"disease"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"disease label")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"classification"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"see below for possible values")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"classificationDate"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"yyyy-MM-dd")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"classification")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"no reported evidence"),(0,i.kt)("li",{parentName:"ul"},"disputed"),(0,i.kt)("li",{parentName:"ul"},"limited"),(0,i.kt)("li",{parentName:"ul"},"moderate"),(0,i.kt)("li",{parentName:"ul"},"definitive"),(0,i.kt)("li",{parentName:"ul"},"strong"),(0,i.kt)("li",{parentName:"ul"},"refuted"),(0,i.kt)("li",{parentName:"ul"},"no known disease relationship")))}c.isMDXComponent=!0},6478:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>r,default:()=>c,frontMatter:()=>l,metadata:()=>o,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const l={},r=void 0,o={unversionedId:"data-sources/clingen-json",id:"version-3.24/data-sources/clingen-json",title:"clingen-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/clingen-json.md",sourceDirName:"data-sources",slug:"/data-sources/clingen-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},p="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(p,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"clingen":[\n {\n "chromosome":"17",\n "begin":525,\n "end":14667519,\n "variantType":"copy_number_gain",\n "id":"nsv996083",\n "clinicalInterpretation":"pathogenic",\n "observedGains":1,\n "validated":true,\n "phenotypes":[\n "Intrauterine growth retardation"\n ],\n "phenotypeIds":[\n "HP:0001511",\n "MedGen:C1853481"\n ],\n "reciprocalOverlap":0.00131\n },\n {\n "chromosome":"17",\n "begin":45835,\n "end":7600330,\n "variantType":"copy_number_loss",\n "id":"nsv869419",\n "clinicalInterpretation":"pathogenic",\n "observedLosses":1,\n "validated":true,\n "phenotypes":[\n "Developmental delay AND/OR other significant developmental or morphological phenotypes"\n ],\n "reciprocalOverlap":0.00254\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"Field"),(0,i.kt)("th",{parentName:"tr",align:null},"Type"),(0,i.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"clingen"),(0,i.kt)("td",{parentName:"tr",align:null},"object array"),(0,i.kt)("td",{parentName:"tr",align:null})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"begin"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"end"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"variantType"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Any of the\xa0sequence alterations defined here.")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"id"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"Identifier from the data source. Alternatively a VID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"clinicalInterpretation"),(0,i.kt)("td",{parentName:"tr",align:null},"string"),(0,i.kt)("td",{parentName:"tr",align:null},"see possible values below")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"observedGains"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,i.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"observedLosses"),(0,i.kt)("td",{parentName:"tr",align:null},"integer"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - (2",(0,i.kt)("sup",null,"31"),"\xa0- 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"validated"),(0,i.kt)("td",{parentName:"tr",align:null},"boolean"),(0,i.kt)("td",{parentName:"tr",align:null})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"phenotypes"),(0,i.kt)("td",{parentName:"tr",align:null},"string array"),(0,i.kt)("td",{parentName:"tr",align:null},"Description of the phenotype.")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"phenotypeIds"),(0,i.kt)("td",{parentName:"tr",align:null},"string array"),(0,i.kt)("td",{parentName:"tr",align:null},"Description of the phenotype IDs.")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,i.kt)("td",{parentName:"tr",align:null},"floating point"),(0,i.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).")))),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"clinicalInterpretation")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"benign"),(0,i.kt)("li",{parentName:"ul"},"curated benign"),(0,i.kt)("li",{parentName:"ul"},"curated pathogenic"),(0,i.kt)("li",{parentName:"ul"},"likely benign"),(0,i.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"path gain"),(0,i.kt)("li",{parentName:"ul"},"path loss"),(0,i.kt)("li",{parentName:"ul"},"pathogenic"),(0,i.kt)("li",{parentName:"ul"},"uncertain")))}c.isMDXComponent=!0},5307:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>d,default:()=>g,frontMatter:()=>s,metadata:()=>p,toc:()=>c});var a=n(7462),i=(n(7294),n(3905)),l=n(6478),r=n(4869),o=n(6361);const s={title:"ClinGen"},d=void 0,p={unversionedId:"data-sources/clingen",id:"version-3.24/data-sources/clingen",title:"ClinGen",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/clingen.mdx",sourceDirName:"data-sources",slug:"/data-sources/clingen",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clingen.mdx",tags:[],version:"3.24",frontMatter:{title:"ClinGen"},sidebar:"docs",previous:{title:"Cancer Hotspots",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cancer-hotspots"},next:{title:"ClinVar",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar"}},c=[{value:"Overview",id:"overview",children:[],level:2},{value:"ISCA Regions",id:"isca-regions",children:[{value:"TSV Extraction",id:"tsv-extraction",children:[{value:"Status levels",id:"status-levels",children:[],level:4},{value:"Parsing",id:"parsing",children:[],level:4}],level:3}],level:2},{value:"Conflict Resolution",id:"conflict-resolution",children:[{value:"Clinical significance priority",id:"clinical-significance-priority",children:[],level:3},{value:"Validation Priority",id:"validation-priority",children:[],level:3},{value:"Download URL",id:"download-url",children:[],level:3},{value:"JSON Output",id:"json-output",children:[],level:3}],level:2},{value:"Dosage Sensitivity Map",id:"dosage-sensitivity-map",children:[{value:"TSV Source files",id:"tsv-source-files",children:[],level:3},{value:"Dosage Rating System",id:"dosage-rating-system",children:[],level:3},{value:"Download URL",id:"download-url-1",children:[],level:3},{value:"JSON Output",id:"json-output-1",children:[],level:3},{value:"Building the supplementary files",id:"building-the-supplementary-files",children:[],level:3}],level:2},{value:"Gene-Disease Validity",id:"gene-disease-validity",children:[{value:"Source TSV",id:"source-tsv",children:[],level:3},{value:"Download URL",id:"download-url-2",children:[],level:3},{value:"Conflict Resolution",id:"conflict-resolution-1",children:[{value:"Multiple Classifications",id:"multiple-classifications",children:[],level:4},{value:"Multiple Dates",id:"multiple-dates",children:[],level:4}],level:3},{value:"JSON Output",id:"json-output-2",children:[],level:3},{value:"Building the supplementary files",id:"building-the-supplementary-files-1",children:[],level:3}],level:2}],u={toc:c},m="wrapper";function g(e){let{components:t,...n}=e;return(0,i.kt)(m,(0,a.Z)({},u,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ",(0,i.kt)("strong",{parentName:"p"},"ClinGen The Clinical Genome Resource.")," ",(0,i.kt)("em",{parentName:"p"},"N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.")))),(0,i.kt)("h2",{id:"isca-regions"},"ISCA Regions"),(0,i.kt)("h3",{id:"tsv-extraction"},"TSV Extraction"),(0,i.kt)("p",null,"ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to ","[BEGIN+1, END]","."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"#bin chrom chromStart chromEnd name score strand thickStart thickEnd attrCount attrTags attrVals\nnsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes\nnsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810\nnsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482\nnsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes\nnsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes\nnsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482\n")),(0,i.kt)("h4",{id:"status-levels"},"Status levels"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"benign"),(0,i.kt)("li",{parentName:"ul"},"curated benign"),(0,i.kt)("li",{parentName:"ul"},"curated pathogenic"),(0,i.kt)("li",{parentName:"ul"},"likely benign"),(0,i.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"path gain"),(0,i.kt)("li",{parentName:"ul"},"path loss"),(0,i.kt)("li",{parentName:"ul"},"pathogenic"),(0,i.kt)("li",{parentName:"ul"},"uncertain")),(0,i.kt)("h4",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"We parse the ClinGen tsv file and extract the following:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"chrom"),(0,i.kt)("li",{parentName:"ul"},"chromStart (note this a 0-based coordinate)"),(0,i.kt)("li",{parentName:"ul"},"chromEnd"),(0,i.kt)("li",{parentName:"ul"},"attrTags"),(0,i.kt)("li",{parentName:"ul"},"attrVals")),(0,i.kt)("p",null,(0,i.kt)("inlineCode",{parentName:"p"},"attrTags")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"attrVals")," are comma separated lists. ",(0,i.kt)("inlineCode",{parentName:"p"},"attrTags")," contains the field keys and ",(0,i.kt)("inlineCode",{parentName:"p"},"attrVals")," contains the field values. We will parse the following keys from the two fields:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"parent (this will be used as the ID in our JSON output)"),(0,i.kt)("li",{parentName:"ul"},"clinical_int"),(0,i.kt)("li",{parentName:"ul"},"validated"),(0,i.kt)("li",{parentName:"ul"},"phenotype (this should be a string array)"),(0,i.kt)("li",{parentName:"ul"},"phenotype_id (this should be a string array)")),(0,i.kt)("p",null,"Observed losses and observed gains will be calculated from entries that share a common parent ID."),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"variants with a common parent ID and same coordinates are grouped",(0,i.kt)("ul",{parentName:"li"},(0,i.kt)("li",{parentName:"ul"},"calculated observed losses, observed gains for each group"),(0,i.kt)("li",{parentName:"ul"},"Clinical significance and validation status are collapsed using the priority strategy described below"))),(0,i.kt)("li",{parentName:"ul"},"Variants with the same parent ID can have different coordinates (mapped to hg38)",(0,i.kt)("ul",{parentName:"li"},(0,i.kt)("li",{parentName:"ul"},"nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)"),(0,i.kt)("li",{parentName:"ul"},"we kept both variants")))),(0,i.kt)("h2",{id:"conflict-resolution"},"Conflict Resolution"),(0,i.kt)("h3",{id:"clinical-significance-priority"},"Clinical significance priority"),(0,i.kt)("p",null,"When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic."),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Priority")," (high to low)"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Priority"),(0,i.kt)("li",{parentName:"ul"},"Pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Likely pathogenic"),(0,i.kt)("li",{parentName:"ul"},"Benign"),(0,i.kt)("li",{parentName:"ul"},"Likely benign"),(0,i.kt)("li",{parentName:"ul"},"Uncertain significance")),(0,i.kt)("h3",{id:"validation-priority"},"Validation Priority"),(0,i.kt)("p",null,"When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated."),(0,i.kt)("h3",{id:"download-url"},"Download URL"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite"},"https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite")),(0,i.kt)("h3",{id:"json-output"},"JSON Output"),(0,i.kt)(l.default,{mdxType:"CLINGENJSON"}),(0,i.kt)("h2",{id:"dosage-sensitivity-map"},"Dosage Sensitivity Map"),(0,i.kt)("p",null,"The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. ",(0,i.kt)("strong",{parentName:"p"},"Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar.")," ",(0,i.kt)("em",{parentName:"p"},"Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.")))),(0,i.kt)("h3",{id:"tsv-source-files"},"TSV Source files"),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Regions")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"#ClinGen Region Curation Results\n#07 May,2019\n#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36\n#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen\n#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key\n#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID\nISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19\nISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10\nISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31\nISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801\n")),(0,i.kt)("p",null,(0,i.kt)("strong",{parentName:"p"},"Genes")),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"#ClinGen Gene Curation Results\n#24 May,2019\n#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13\n#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen\n#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol\n#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID\nA4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400\nAAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600\n")),(0,i.kt)("h3",{id:"dosage-rating-system"},"Dosage Rating System"),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:null},"Rating"),(0,i.kt)("th",{parentName:"tr",align:null},"Possible Clinical Interpretation"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"0"),(0,i.kt)("td",{parentName:"tr",align:null},"No evidence to suggest that dosage sensitivity is associated with clinical phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"1"),(0,i.kt)("td",{parentName:"tr",align:null},"Little evidence suggesting dosage sensitivity is associated with clinical phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"2"),(0,i.kt)("td",{parentName:"tr",align:null},"Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"3"),(0,i.kt)("td",{parentName:"tr",align:null},"Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"30"),(0,i.kt)("td",{parentName:"tr",align:null},"Gene associated with autosomal recessive phenotype")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:null},"40"),(0,i.kt)("td",{parentName:"tr",align:null},"Dosage sensitivity unlikely")))),(0,i.kt)("p",null,"Reference: ",(0,i.kt)("a",{parentName:"p",href:"https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml"},"https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml")),(0,i.kt)("h3",{id:"download-url-1"},"Download URL"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"ftp://ftp.clinicalgenome.org/"},"ftp://ftp.clinicalgenome.org/")),(0,i.kt)("h3",{id:"json-output-1"},"JSON Output"),(0,i.kt)(r.default,{mdxType:"ClinGenDosageJson"}),(0,i.kt)("h3",{id:"building-the-supplementary-files"},"Building the supplementary files"),(0,i.kt)("p",null,"The gene dosage sensitivity ",(0,i.kt)("inlineCode",{parentName:"p"},".nga")," for Illumina Connected Annotations can be built using the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,i.kt)("inlineCode",{parentName:"p"},"DosageSensitivity")," subcommand. The required data file is ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinGen_gene_curation_list_{ASSEMBLY}.tsv")," (url provided above) and its associated ",(0,i.kt)("inlineCode",{parentName:"p"},".version")," file."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"NAME=ClinGen Dosage Sensitivity Map\nVERSION=20211201\nDATE=2021-12-01\nDESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)\n")),(0,i.kt)("p",null,"Here is a sample run:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll DosageSensitivity\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll dosagesensitivity [options]\nCreates a gene annotation database from dbVar data\n\nOPTIONS:\n --tsv, -t input tsv file\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\ndotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\n\nTime: 00:00:00.1\n")),(0,i.kt)("p",null,"For building the ",(0,i.kt)("inlineCode",{parentName:"p"},".nsi")," files, we use the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,i.kt)("inlineCode",{parentName:"p"},"DosageMapRegions")," subcommand. The required data file is ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinGen_region_curation_list_{ASSEMBLY}.tsv")," (url provided above) and its associated ",(0,i.kt)("inlineCode",{parentName:"p"},".version")," file."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"NAME=ClinGen Dosage Sensitivity Map\nVERSION=20211201\nDATE=2021-12-01\nDESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)\n")),(0,i.kt)("p",null,"Here is a sample run:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll DosageMapRegions\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll dosagemapregions [options]\nCreates an interval annotation database from dbVar data\n\nOPTIONS:\n --tsv, -t input tsv file\n --ref, -r input reference filename\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\ndotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nWriting 505 intervals to database...\n\nTime: 00:00:00.1\n")),(0,i.kt)("p",null,"You can also use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate")," to generate ClinGen files. To use ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),", read more in ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," section."),(0,i.kt)("h2",{id:"gene-disease-validity"},"Gene-Disease Validity"),(0,i.kt)("p",null,"The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Strande NT, Riggs ER, Buchanan AH, et al. ",(0,i.kt)("strong",{parentName:"p"},"Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource.")," ",(0,i.kt)("em",{parentName:"p"},"Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015")))),(0,i.kt)("h3",{id:"source-tsv"},"Source TSV"),(0,i.kt)("p",null,"The source data comes in a CSV file that we convert to a TSV."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"CLINGEN GENE VALIDITY CURATIONS\nFILE CREATED: 2019-05-28\nWEBPAGE: https://search.clinicalgenome.org/kb/gene-validity\n+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++\nGENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE\n+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++\nA2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z\nA2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z\nA2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z\n")),(0,i.kt)("h3",{id:"download-url-2"},"Download URL"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity"},"https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity")),(0,i.kt)("h3",{id:"conflict-resolution-1"},"Conflict Resolution"),(0,i.kt)("h4",{id:"multiple-classifications"},"Multiple Classifications"),(0,i.kt)("p",null,"Here is an example of multiple classifications."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv | grep EDNRB\nEDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z\nEDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z\n")),(0,i.kt)("p",null,"In such cases, we select the more severe classification."),(0,i.kt)("h4",{id:"multiple-dates"},"Multiple Dates"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv | grep MUTYH\nMUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00\nMUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00\n")),(0,i.kt)("p",null,"If the classifications are the same, we should select the latest classification date."),(0,i.kt)("h3",{id:"json-output-2"},"JSON Output"),(0,i.kt)(o.default,{mdxType:"ClinGenGeneValidity"}),(0,i.kt)("h3",{id:"building-the-supplementary-files-1"},"Building the supplementary files"),(0,i.kt)("p",null,"The gene disease validity ",(0,i.kt)("inlineCode",{parentName:"p"},".nga")," for Illumina Connected Annotations can be built using the ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's ",(0,i.kt)("inlineCode",{parentName:"p"},"DiseaseValidity")," subcommand. The only required data file is ",(0,i.kt)("inlineCode",{parentName:"p"},"Clingen-Gene-Disease-Summary-2021-12-01.tsv")," (url provided above) and its associated ",(0,i.kt)("inlineCode",{parentName:"p"},".version")," file."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"NAME=ClinGen disease validity curations\nVERSION=20211201\nDATE=2021-12-01\nDESCRIPTION=Disease validity curations from ClinGen (dbVar)\n")),(0,i.kt)("p",null,"Here is a sample run:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"}," dotnet SAUtils.dll DiseaseValidity\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll diseasevalidity [options]\nCreates a gene annotation database from ClinGen gene validity data\n\nOPTIONS:\n --csv, -i ClinGen gene validity file path\n --cache, -c \n input cache directory\n --ref, -r input reference filename\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n\ndotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\\\\n--uga Cache --out SupplementaryDatabase\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\nStromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953\n---------------------------------------------------------------------------\n\nNumber of geneIds missing from the cache:0 (0%)\n\nTime: 00:00:00.2\n")),(0,i.kt)("p",null,"You can also use ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," command's subcommands ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate")," to generate ClinGen files. To use ",(0,i.kt)("inlineCode",{parentName:"p"},"AutoDownloadGenerate"),", read more in ",(0,i.kt)("inlineCode",{parentName:"p"},"SAUtils")," section."))}g.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/987b70e8.239750d6.js b/assets/js/987b70e8.239750d6.js new file mode 100644 index 00000000..d79a4f12 --- /dev/null +++ b/assets/js/987b70e8.239750d6.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6096,9896],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>h});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},u=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},c="mdxType",p={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,s=e.parentName,u=l(e,["components","mdxType","originalType","parentName"]),c=d(n),m=r,h=c["".concat(s,".").concat(m)]||c[m]||p[m]||o;return n?a.createElement(h,i(i({ref:t},u),{},{components:n})):a.createElement(h,i({ref:t},u))}));function h(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=m;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[c]="string"==typeof e?e:r,i[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>c,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/gerp-json",id:"version-3.24/data-sources/gerp-json",title:"gerp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gerp-json.md",sourceDirName:"data-sources",slug:"/data-sources/gerp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gerp-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},u="wrapper";function c(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"gerpScore": 1.27\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gerpScore"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: -\u221e to +\u221e")))))}c.isMDXComponent=!0},3679:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>p,frontMatter:()=>i,metadata:()=>s,toc:()=>d});var a=n(7462),r=(n(7294),n(3905)),o=n(1399);const i={title:"GERP"},l=void 0,s={unversionedId:"data-sources/gerp",id:"version-3.24/data-sources/gerp",title:"GERP",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/gerp.mdx",sourceDirName:"data-sources",slug:"/data-sources/gerp",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gerp.mdx",tags:[],version:"3.24",frontMatter:{title:"GERP"},sidebar:"docs",previous:{title:"FusionCatcher",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher"},next:{title:"GME Variome",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"Source Files",id:"source-files",children:[{value:"Example GRCh37",id:"example-grch37",children:[],level:3},{value:"Example GRCh38",id:"example-grch38",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URL",id:"download-url",children:[{value:"GRCh37",id:"grch37",children:[],level:3},{value:"GRCh38",id:"grch38",children:[],level:3}],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],u={toc:d},c="wrapper";function p(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},u,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"GERP identifies constrained elements in multiple alignments by quantifying substitution deficits.\nThese deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint (Rejected Substitutions).\nIllumina Connected Annotations uses GERP++ which is based on a significantly faster and more statistically robust maximum likelihood estimation procedure to compute expected rates of evolution."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},'Davydov, Eugene V., et al. "Identifying a high fraction of the human genome to be under selective constraint using GERP++." ',(0,r.kt)("em",{parentName:"p"},"PLoS computational biology")," ",(0,r.kt)("strong",{parentName:"p"},"6.12")," e1001025 (2010). ",(0,r.kt)("a",{parentName:"p",href:"https://doi.org/10.1371/journal.pcbi.1001025"},"https://doi.org/10.1371/journal.pcbi.1001025")))),(0,r.kt)("h2",{id:"source-files"},"Source Files"),(0,r.kt)("h3",{id:"example-grch37"},"Example GRCh37"),(0,r.kt)("p",null,"GRCh37 file is a TSV format"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-tsv"},"chr position GERP\n1 12177 0.83\n1 12178 -0.206\n1 12179 -0.492\n1 12180 -1.66\n1 12181 0.83\n1 12182 0.83\n1 12183 -0.417\n1 12184 0.83\n")),(0,r.kt)("h3",{id:"example-grch38"},"Example GRCh38"),(0,r.kt)("p",null,"GRCh38 file is a lift-over BED format"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-tsv"},"chr pos_start pos_end GERP\n1 12646 12647 0.298\n1 12647 12648 2.63\n1 12648 12649 1.87\n1 12649 12650 0.252\n1 12650 12651 -2.06\n1 12651 12652 2.61\n1 12652 12653 3.97\n")),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"From the CSV file, we are interested in columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"chr")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"position")),(0,r.kt)("li",{parentName:"ul"},(0,r.kt)("inlineCode",{parentName:"li"},"GERP"))),(0,r.kt)("h2",{id:"known-issues"},"Known Issues"),(0,r.kt)("p",null,"None"),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("h3",{id:"grch37"},"GRCh37"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html"},"http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html")),(0,r.kt)("h3",{id:"grch38"},"GRCh38"),(0,r.kt)("p",null,"The data is not available for GRCh38 on GERP++ website, and was obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/"},"https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(o.default,{mdxType:"JSON"}))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/a10271fe.ffa3312f.js b/assets/js/a10271fe.ffa3312f.js new file mode 100644 index 00000000..e52299ee --- /dev/null +++ b/assets/js/a10271fe.ffa3312f.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4861],{3905:(e,n,t)=>{t.d(n,{Zo:()=>p,kt:()=>h});var i=t(7294);function a(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function o(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);n&&(i=i.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,i)}return t}function l(e){for(var n=1;n=0||(a[t]=e[t]);return a}(e,n);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(i=0;i=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(a[t]=e[t])}return a}var s=i.createContext({}),c=function(e){var n=i.useContext(s),t=n;return e&&(t="function"==typeof e?e(n):l(l({},n),e)),t},p=function(e){var n=c(e.components);return i.createElement(s.Provider,{value:n},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var n=e.children;return i.createElement(i.Fragment,{},n)}},d=i.forwardRef((function(e,n){var t=e.components,a=e.mdxType,o=e.originalType,s=e.parentName,p=r(e,["components","mdxType","originalType","parentName"]),u=c(t),d=a,h=u["".concat(s,".").concat(d)]||u[d]||m[d]||o;return t?i.createElement(h,l(l({ref:n},p),{},{components:t})):i.createElement(h,l({ref:n},p))}));function h(e,n){var t=arguments,a=n&&n.mdxType;if("string"==typeof e||a){var o=t.length,l=new Array(o);l[0]=d;var r={};for(var s in n)hasOwnProperty.call(n,s)&&(r[s]=n[s]);r.originalType=e,r[u]="string"==typeof e?e:a,l[1]=r;for(var c=2;c{t.r(n),t.d(n,{contentTitle:()=>l,default:()=>u,frontMatter:()=>o,metadata:()=>r,toc:()=>s});var i=t(7462),a=(t(7294),t(3905));const o={title:"Jasix"},l=void 0,r={unversionedId:"utilities/jasix",id:"version-3.24/utilities/jasix",title:"Jasix",description:"Overview",source:"@site/versioned_docs/version-3.24/utilities/jasix.mdx",sourceDirName:"utilities",slug:"/utilities/jasix",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/jasix",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/utilities/jasix.mdx",tags:[],version:"3.24",frontMatter:{title:"Jasix"},sidebar:"docs",previous:{title:"Variant IDs",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/variant-ids"},next:{title:"SAUtils",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/sautils"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Creating the Jasix index",id:"creating-the-jasix-index",children:[{value:"Example",id:"example",children:[],level:3}],level:2},{value:"Querying the index",id:"querying-the-index",children:[],level:2},{value:"Extracting a section",id:"extracting-a-section",children:[],level:2}],c={toc:s},p="wrapper";function u(e){let{components:n,...t}=e;return(0,a.kt)(p,(0,i.Z)({},c,t,{components:n,mdxType:"MDXLayout"}),(0,a.kt)("h2",{id:"overview"},"Overview"),(0,a.kt)("p",null,"The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output."),(0,a.kt)("h2",{id:"creating-the-jasix-index"},"Creating the Jasix index"),(0,a.kt)("p",null,"The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix."),(0,a.kt)("h3",{id:"example"},"Example"),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet Jasix.dll -h\nUSAGE: dotnet Jasix.dll -i in.json.gz [options]\nIndexes a Illumina Connected Annotations annotated JSON file\n\nOPTIONS:\n --header, -t print also the header lines\n --only-header, -H print only the header lines\n --chromosomes, -l list chromosome names\n --index, -c create index\n --in, -i input\n --out, -o compressed output file name (default:console)\n --query, -q query range\n --section, -s complete section (positions or genes) to output\n --help, -h displays the help menu\n --version, -v displays the version\n")),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet Jasix.dll --index -i input.json.gz\n---------------------------------------------------------------------------\nJasix (c) 2023 Illumina, Inc.\n 3.22.0\n---------------------------------------------------------------------------\n\nRef Sequence chrM indexed in 00:00:00.2\nRef Sequence chr1 indexed in 00:00:05.8\nRef Sequence chr2 indexed in 00:00:06.0\n.\n.\n.\nPeak memory usage: 28.5 MB\nTime: 00:01:14.8\n")),(0,a.kt)("h2",{id:"querying-the-index"},"Querying the index"),(0,a.kt)("p",null,"The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided."),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet Jasix.dll -i input.json.gz chrM:5000-7000\n")),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'{\n "positions":[\n {\n "chromosome":"chrM",\n "refAllele":"C",\n "position":5581,\n "quality":3070.00,\n "filters":[\n "LowGQXHomSNP"\n ],\n "altAlleles":[\n "T"\n ],\n "samples":[\n {\n "variantFreq":1,\n "totalDepth":1625,\n "genotypeQuality":1,\n "alleleDepths":[\n 0,\n 1625\n ],\n "genotype":"1/1"\n }\n ],\n "variants":[\n {\n "altAllele":"T",\n "refAllele":"C",\n "begin":5581,\n "chromosome":"chrM",\n "end":5581,\n "variantType":"SNV",\n "vid":"MT:5581:T"\n }\n ]\n },\n {\n "chromosome":"chrM",\n "refAllele":"A",\n "position":6267,\n "quality":1637.00,\n "filters":[\n "LowGQXHetSNP"\n ],\n "altAlleles":[\n "G"\n ],\n "samples":[\n {\n "variantFreq":0.6873,\n "totalDepth":323,\n "genotypeQuality":1,\n "alleleDepths":[\n 101,\n 222\n ],\n "genotype":"0/1"\n }\n ],\n "variants":[\n {\n "altAllele":"G",\n "refAllele":"A",\n "begin":6267,\n "chromosome":"chrM",\n "end":6267,\n "variantType":"SNV",\n "vid":"MT:6267:G"\n }\n ]\n }\n ]\n}\n\n')),(0,a.kt)("p",null,'The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).'),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet Jasix.dll -i input.json.gz -q chrM:5000-7000 -q chrM:8500-9500 -t\n")),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'{\n "header":{\n "annotator":"Illumina Annotation Engine 1.6.2.0",\n "creationTime":"2017-08-30 11:42:57",\n "genomeAssembly":"GRCh37",\n "schemaVersion":6,\n "dataVersion":"84.24.39",\n "dataSources":[\n {\n "name":"VEP",\n "version":"84",\n "description":"Ensembl",\n "releaseDate":"2017-01-16"\n }\n ],\n "samples":[\n "Mother"\n ]\n },\n "positions":[\n {\n "chromosome":"chrM",\n "refAllele":"C",\n "position":5581,\n "quality":3070.00,\n "filters":[\n "LowGQXHomSNP"\n ],\n "altAlleles":[\n "T"\n ],\n "samples":[\n {\n "variantFreq":1,\n "totalDepth":1625,\n "genotypeQuality":1,\n "alleleDepths":[\n 0,\n 1625\n ],\n "genotype":"1/1"\n }\n ],\n "variants":[\n {\n "altAllele":"T",\n "refAllele":"C",\n "begin":5581,\n "chromosome":"chrM",\n "end":5581,\n "variantType":"SNV",\n "vid":"MT:5581:T"\n }\n ]\n },\n {\n "chromosome":"chrM",\n "refAllele":"A",\n "position":6267,\n "quality":1637.00,\n "filters":[\n "LowGQXHetSNP"\n ],\n "altAlleles":[\n "G"\n ],\n "samples":[\n {\n "variantFreq":0.6873,\n "totalDepth":323,\n "genotypeQuality":1,\n "alleleDepths":[\n 101,\n 222\n ],\n "genotype":"0/1"\n }\n ],\n "variants":[\n {\n "altAllele":"G",\n "refAllele":"A",\n "begin":6267,\n "chromosome":"chrM",\n "end":6267,\n "variantType":"SNV",\n "vid":"MT:6267:G"\n }\n ]\n },\n {\n "chromosome":"chrM",\n "refAllele":"G",\n "position":8702,\n "quality":3070.00,\n "filters":[\n "LowGQXHomSNP"\n ],\n "altAlleles":[\n "A"\n ],\n "samples":[\n {\n "variantFreq":0.9987,\n "totalDepth":1534,\n "genotypeQuality":1,\n "alleleDepths":[\n 2,\n 1532\n ],\n "genotype":"1/1"\n }\n ],\n "variants":[\n {\n "altAllele":"A",\n "refAllele":"G",\n "begin":8702,\n "chromosome":"chrM",\n "end":8702,\n "variantType":"SNV",\n "vid":"MT:8702:A"\n }\n ]\n },\n {\n "chromosome":"chrM",\n "refAllele":"G",\n "position":9378,\n "quality":3070.00,\n "filters":[\n "LowGQXHomSNP"\n ],\n "altAlleles":[\n "A"\n ],\n "samples":[\n {\n "variantFreq":1,\n "totalDepth":1018,\n "genotypeQuality":1,\n "alleleDepths":[\n 0,\n 1018\n ],\n "genotype":"1/1"\n }\n ],\n "variants":[\n {\n "altAllele":"A",\n "refAllele":"G",\n "begin":9378,\n "chromosome":"chrM",\n "end":9378,\n "variantType":"SNV",\n "vid":"MT:9378:A"\n }\n ]\n }\n ]\n}\n')),(0,a.kt)("h2",{id:"extracting-a-section"},"Extracting a section"),(0,a.kt)("p",null,"The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option."),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet Jasix.dll -i input.json.gz -s genes\n")),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'[\n{\n "name": "ABCB10",\n "omim": [\n {\n "mimNumber": 605454,\n "geneName": "ATP-binding cassette, subfamily B, member 10"\n }\n ]\n},\n{\n "name": "ABCD3",\n "omim": [\n {\n "mimNumber": 170995,\n "geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",\n "description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",\n "phenotypes": [\n {\n "mimNumber": 616278,\n "phenotype": "?Bile acid synthesis defect, congenital, 5",\n "mapping": "molecular basis of the disorder is known",\n "inheritances": [\n "Autosomal recessive"\n ],\n "comments": [\n "unconfirmed or possibly spurious mapping"\n ]\n }\n ]\n }\n ]\n}\n]\n')))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/a5e136a1.4b8c7497.js b/assets/js/a5e136a1.4b8c7497.js new file mode 100644 index 00000000..051c9a05 --- /dev/null +++ b/assets/js/a5e136a1.4b8c7497.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[8111],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var c=a.createContext({}),s=function(e){var t=a.useContext(c),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},p=function(e){var t=s(e.components);return a.createElement(c.Provider,{value:t},e.children)},m="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,l=e.originalType,c=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),m=s(n),u=r,v=m["".concat(c,".").concat(u)]||m[u]||d[u]||l;return n?a.createElement(v,i(i({ref:t},p),{},{components:n})):a.createElement(v,i({ref:t},p))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var l=n.length,i=new Array(l);i[0]=u;var o={};for(var c in t)hasOwnProperty.call(t,c)&&(o[c]=t[c]);o.originalType=e,o[m]="string"==typeof e?e:r,i[1]=o;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>m,frontMatter:()=>l,metadata:()=>o,toc:()=>c});var a=n(7462),r=(n(7294),n(3905));const l={title:"Variant IDs"},i=void 0,o={unversionedId:"core-functionality/variant-ids",id:"core-functionality/variant-ids",title:"Variant IDs",description:"Overview",source:"@site/docs/core-functionality/variant-ids.md",sourceDirName:"core-functionality",slug:"/core-functionality/variant-ids",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/variant-ids.md",tags:[],version:"current",frontMatter:{title:"Variant IDs"},sidebar:"docs",previous:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts"},next:{title:"Jasix",permalink:"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix"}},c=[{value:"Overview",id:"overview",children:[],level:2},{value:"Small Variants",id:"small-variants",children:[{value:"VCF Examples",id:"vcf-examples",children:[],level:3},{value:"Format",id:"format",children:[],level:3},{value:"VID Examples",id:"vid-examples",children:[],level:3}],level:2},{value:"Translocation Breakends",id:"translocation-breakends",children:[{value:"VCF Example",id:"vcf-example",children:[],level:3},{value:"Format",id:"format-1",children:[],level:3},{value:"VID Example",id:"vid-example",children:[],level:3}],level:2},{value:"All Other Structural Variants",id:"all-other-structural-variants",children:[{value:"VCF Examples",id:"vcf-examples-1",children:[],level:3},{value:"Format",id:"format-2",children:[],level:3},{value:"VID Examples",id:"vid-examples-1",children:[],level:3}],level:2}],s={toc:c},p="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute."),(0,r.kt)("p",null,"The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Conventions")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ul",{parentName:"div"},(0,r.kt)("li",{parentName:"ul"},"all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)"),(0,r.kt)("li",{parentName:"ul"},"for a reference variant (i.e. no alt allele), replace the period (.) with the reference base"),(0,r.kt)("li",{parentName:"ul"},"padding bases are used, neither the reference nor alternate allele can be empty"),(0,r.kt)("li",{parentName:"ul"},"some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base")))),(0,r.kt)("h2",{id:"small-variants"},"Small Variants"),(0,r.kt)("h3",{id:"vcf-examples"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 66507 . T A 184.45 PASS .\nchr1 66521 . T TATATA 144.53 PASS .\nchr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .\n")),(0,r.kt)("h3",{id:"format"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-examples"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-66507-T-A"),(0,r.kt)("li",{parentName:"ul"},"1-66521-T-TATATA"),(0,r.kt)("li",{parentName:"ul"},"1-66572-GTA-G"),(0,r.kt)("li",{parentName:"ul"},"1-66572-G-GTACTATATATTA")),(0,r.kt)("h2",{id:"translocation-breakends"},"Translocation Breakends"),(0,r.kt)("h3",{id:"vcf-example"},"VCF Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 2617277 . A AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[ . PASS SVTYPE=BND\n")),(0,r.kt)("h3",{id:"format-1"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-example"},"VID Example"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[")),(0,r.kt)("h2",{id:"all-other-structural-variants"},"All Other Structural Variants"),(0,r.kt)("h3",{id:"vcf-examples-1"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 1000 . G . PASS END=3001000;SVTYPE=ROH\nchr1 1350082 . G . PASS END=1351320;SVTYPE=DEL\nchr1 1477854 . C . PASS END=1477984;SVTYPE=DUP\nchr1 1477968 . T . PASS END=1477968;SVTYPE=INS\nchr1 1715898 . N . PASS SVTYPE=CNV;END=1750149\nchr1 2650426 . N . PASS SVTYPE=CNV;END=2653074\nchr2 321682 . T . PASS SVTYPE=INV;END=421681\nchr20 2633403 . G . PASS END=2633421\n")),(0,r.kt)("h3",{id:"format-2"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"end position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"SVTYPE")),(0,r.kt)("h3",{id:"vid-examples-1"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-1000-3001000-G-","<","ROH",">","-ROH"),(0,r.kt)("li",{parentName:"ul"},"1-1350082-1351320-G-","<","DEL",">","-DEL"),(0,r.kt)("li",{parentName:"ul"},"1-1477854-1477984-C-","<","DUP:TANDEM",">","-DUP"),(0,r.kt)("li",{parentName:"ul"},"1-1477968-1477968-T-","<","INS",">","-INS"),(0,r.kt)("li",{parentName:"ul"},"1-1715898-1750149-A-","<","DUP",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(replace the N with A)")),(0,r.kt)("li",{parentName:"ul"},"1-2650426-2653074-N-","<","DEL",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(keep the N)")),(0,r.kt)("li",{parentName:"ul"},"2-321682-421681-T-","<","INV",">","-INV"),(0,r.kt)("li",{parentName:"ul"},"20-2633403-2633421-G-","<","STR2",">","-STR")))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/a5e136a1.c7e5c6d7.js b/assets/js/a5e136a1.c7e5c6d7.js deleted file mode 100644 index e1f1a983..00000000 --- a/assets/js/a5e136a1.c7e5c6d7.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[8111],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var c=a.createContext({}),s=function(e){var t=a.useContext(c),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},p=function(e){var t=s(e.components);return a.createElement(c.Provider,{value:t},e.children)},m="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,l=e.originalType,c=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),m=s(n),u=r,v=m["".concat(c,".").concat(u)]||m[u]||d[u]||l;return n?a.createElement(v,i(i({ref:t},p),{},{components:n})):a.createElement(v,i({ref:t},p))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var l=n.length,i=new Array(l);i[0]=u;var o={};for(var c in t)hasOwnProperty.call(t,c)&&(o[c]=t[c]);o.originalType=e,o[m]="string"==typeof e?e:r,i[1]=o;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>m,frontMatter:()=>l,metadata:()=>o,toc:()=>c});var a=n(7462),r=(n(7294),n(3905));const l={title:"Variant IDs"},i=void 0,o={unversionedId:"core-functionality/variant-ids",id:"core-functionality/variant-ids",title:"Variant IDs",description:"Overview",source:"@site/docs/core-functionality/variant-ids.md",sourceDirName:"core-functionality",slug:"/core-functionality/variant-ids",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/variant-ids.md",tags:[],version:"current",frontMatter:{title:"Variant IDs"},sidebar:"docs",previous:{title:"Gene Fusion Detection",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions"},next:{title:"Jasix",permalink:"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix"}},c=[{value:"Overview",id:"overview",children:[],level:2},{value:"Small Variants",id:"small-variants",children:[{value:"VCF Examples",id:"vcf-examples",children:[],level:3},{value:"Format",id:"format",children:[],level:3},{value:"VID Examples",id:"vid-examples",children:[],level:3}],level:2},{value:"Translocation Breakends",id:"translocation-breakends",children:[{value:"VCF Example",id:"vcf-example",children:[],level:3},{value:"Format",id:"format-1",children:[],level:3},{value:"VID Example",id:"vid-example",children:[],level:3}],level:2},{value:"All Other Structural Variants",id:"all-other-structural-variants",children:[{value:"VCF Examples",id:"vcf-examples-1",children:[],level:3},{value:"Format",id:"format-2",children:[],level:3},{value:"VID Examples",id:"vid-examples-1",children:[],level:3}],level:2}],s={toc:c},p="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute."),(0,r.kt)("p",null,"The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Conventions")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ul",{parentName:"div"},(0,r.kt)("li",{parentName:"ul"},"all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)"),(0,r.kt)("li",{parentName:"ul"},"for a reference variant (i.e. no alt allele), replace the period (.) with the reference base"),(0,r.kt)("li",{parentName:"ul"},"padding bases are used, neither the reference nor alternate allele can be empty"),(0,r.kt)("li",{parentName:"ul"},"some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base")))),(0,r.kt)("h2",{id:"small-variants"},"Small Variants"),(0,r.kt)("h3",{id:"vcf-examples"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 66507 . T A 184.45 PASS .\nchr1 66521 . T TATATA 144.53 PASS .\nchr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .\n")),(0,r.kt)("h3",{id:"format"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-examples"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-66507-T-A"),(0,r.kt)("li",{parentName:"ul"},"1-66521-T-TATATA"),(0,r.kt)("li",{parentName:"ul"},"1-66572-GTA-G"),(0,r.kt)("li",{parentName:"ul"},"1-66572-G-GTACTATATATTA")),(0,r.kt)("h2",{id:"translocation-breakends"},"Translocation Breakends"),(0,r.kt)("h3",{id:"vcf-example"},"VCF Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 2617277 . A AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[ . PASS SVTYPE=BND\n")),(0,r.kt)("h3",{id:"format-1"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-example"},"VID Example"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[")),(0,r.kt)("h2",{id:"all-other-structural-variants"},"All Other Structural Variants"),(0,r.kt)("h3",{id:"vcf-examples-1"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 1000 . G . PASS END=3001000;SVTYPE=ROH\nchr1 1350082 . G . PASS END=1351320;SVTYPE=DEL\nchr1 1477854 . C . PASS END=1477984;SVTYPE=DUP\nchr1 1477968 . T . PASS END=1477968;SVTYPE=INS\nchr1 1715898 . N . PASS SVTYPE=CNV;END=1750149\nchr1 2650426 . N . PASS SVTYPE=CNV;END=2653074\nchr2 321682 . T . PASS SVTYPE=INV;END=421681\nchr20 2633403 . G . PASS END=2633421\n")),(0,r.kt)("h3",{id:"format-2"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"end position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"SVTYPE")),(0,r.kt)("h3",{id:"vid-examples-1"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-1000-3001000-G-","<","ROH",">","-ROH"),(0,r.kt)("li",{parentName:"ul"},"1-1350082-1351320-G-","<","DEL",">","-DEL"),(0,r.kt)("li",{parentName:"ul"},"1-1477854-1477984-C-","<","DUP:TANDEM",">","-DUP"),(0,r.kt)("li",{parentName:"ul"},"1-1477968-1477968-T-","<","INS",">","-INS"),(0,r.kt)("li",{parentName:"ul"},"1-1715898-1750149-A-","<","DUP",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(replace the N with A)")),(0,r.kt)("li",{parentName:"ul"},"1-2650426-2653074-N-","<","DEL",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(keep the N)")),(0,r.kt)("li",{parentName:"ul"},"2-321682-421681-T-","<","INV",">","-INV"),(0,r.kt)("li",{parentName:"ul"},"20-2633403-2633421-G-","<","STR2",">","-STR")))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/a6af1fd8.ecc303b1.js b/assets/js/a6af1fd8.ecc303b1.js new file mode 100644 index 00000000..84e396ba --- /dev/null +++ b/assets/js/a6af1fd8.ecc303b1.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1831],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function s(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var i=r.createContext({}),l=function(e){var t=r.useContext(i),n=t;return e&&(n="function"==typeof e?e(t):s(s({},t),e)),n},p=function(e){var t=l(e.components);return r.createElement(i.Provider,{value:t},e.children)},u="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},m=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,i=e.parentName,p=c(e,["components","mdxType","originalType","parentName"]),u=l(n),m=a,f=u["".concat(i,".").concat(m)]||u[m]||d[m]||o;return n?r.createElement(f,s(s({ref:t},p),{},{components:n})):r.createElement(f,s({ref:t},p))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,s=new Array(o);s[0]=m;var c={};for(var i in t)hasOwnProperty.call(t,i)&&(c[i]=t[i]);c.originalType=e,c[u]="string"==typeof e?e:a,s[1]=c;for(var l=2;l{n.r(t),n.d(t,{contentTitle:()=>s,default:()=>u,frontMatter:()=>o,metadata:()=>c,toc:()=>i});var r=n(7462),a=(n(7294),n(3905));const o={},s=void 0,c={unversionedId:"data-sources/dbsnp-json",id:"version-3.24/data-sources/dbsnp-json",title:"dbsnp-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dbsnp-json.md",sourceDirName:"data-sources",slug:"/data-sources/dbsnp-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dbsnp-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],l={toc:i},p="wrapper";function u(e){let{components:t,...n}=e;return(0,a.kt)(p,(0,r.Z)({},l,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"dbsnp":[\n "rs1042821"\n]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"dbsnp"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"dbSNP rsIDs")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/a7b23c85.58554146.js b/assets/js/a7b23c85.58554146.js new file mode 100644 index 00000000..9a9e995e --- /dev/null +++ b/assets/js/a7b23c85.58554146.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1541],{3905:(t,e,n)=>{n.d(e,{Zo:()=>c,kt:()=>d});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function o(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var i=r.createContext({}),m=function(t){var e=r.useContext(i),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},c=function(t){var e=m(t.components);return r.createElement(i.Provider,{value:e},t.children)},s="mdxType",f={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},u=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,l=t.originalType,i=t.parentName,c=p(t,["components","mdxType","originalType","parentName"]),s=m(n),u=a,d=s["".concat(i,".").concat(u)]||s[u]||f[u]||l;return n?r.createElement(d,o(o({ref:e},c),{},{components:n})):r.createElement(d,o({ref:e},c))}));function d(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var l=n.length,o=new Array(l);o[0]=u;var p={};for(var i in e)hasOwnProperty.call(e,i)&&(p[i]=e[i]);p.originalType=t,p[s]="string"==typeof t?t:a,o[1]=p;for(var m=2;m{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>s,frontMatter:()=>l,metadata:()=>p,toc:()=>i});var r=n(7462),a=(n(7294),n(3905));const l={},o=void 0,p={unversionedId:"data-sources/1000Genomes-snv-json",id:"version-3.24/data-sources/1000Genomes-snv-json",title:"1000Genomes-snv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-snv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-snv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],m={toc:i},c="wrapper";function s(t){let{components:e,...n}=t;return(0,a.kt)(c,(0,r.Z)({},m,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":{\n "allAf":0.200879,\n "afrAf":0.210287,\n "amrAf":0.139769,\n "easAf":0.275794,\n "eurAf":0.181909,\n "sasAf":0.173824,\n "allAn":5008,\n "afrAn":1322,\n "amrAn":694,\n "easAn":1008,\n "eurAn":1006,\n "sasAn":978,\n "allAc":1006,\n "afrAc":278,\n "amrAc":97,\n "easAc":278,\n "eurAc":183,\n "sasAc":170\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"allAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for all populations. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"allAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for all populations. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"allAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for all populations. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"afrAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the African super population. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"afrAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for the African super population. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"afrAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for the African super population. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"amrAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"amrAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for the Ad Mixed American super population. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"amrAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for the Ad Mixed American super population. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"easAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"easAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for the East Asian super population. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"easAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for the East Asian super population. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"eurAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the European super population. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"eurAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for the European super population. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"eurAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for the European super population. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"sasAf"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"sasAc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele count for the South Asian super population. Integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"sasAn"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"allele number for the South Asian super population. Non-zero integer.")))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/af18058a.17789ebc.js b/assets/js/af18058a.17789ebc.js new file mode 100644 index 00000000..6a14e9a7 --- /dev/null +++ b/assets/js/af18058a.17789ebc.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1704,6558],{3905:(e,t,n)=>{n.d(t,{Zo:()=>d,kt:()=>N});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),p=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},d=function(e){var t=p(e.components);return a.createElement(s.Provider,{value:t},e.children)},m="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,d=l(e,["components","mdxType","originalType","parentName"]),m=p(n),u=i,N=m["".concat(s,".").concat(u)]||m[u]||c[u]||r;return n?a.createElement(N,o(o({ref:t},d),{},{components:n})):a.createElement(N,o({ref:t},d))}));function N(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[m]="string"==typeof e?e:i,o[1]=l;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>m,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={},o=void 0,l={unversionedId:"data-sources/primate-ai-json",id:"version-3.24/data-sources/primate-ai-json",title:"primate-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/primate-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/primate-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/primate-ai-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},d="wrapper";function m(e){let{components:t,...n}=e;return(0,i.kt)(d,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"primateAI-3D": [\n {\n "aminoAcidPosition": 2,\n "refAminoAcid": "V",\n "altAminoAcid": "M",\n "score": 0.616944,\n "scorePercentile": 0.52,\n "classification": "pathogenic", \n "ensemblTranscriptId": "ENST00000335137.4",\n "refSeqTranscriptId": "NM_001005484.1"\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"aminoAcidPosition"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Amino Acid Position (1-based)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAminoAcid"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Reference Amino Acid")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAminoAcid"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Alternate Amino Acid")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"ensemblTranscriptId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (Ensembl)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refSeqTranscriptId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Transcript ID (RefSeq)")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float"),(0,i.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"score"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float"),(0,i.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"classification"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"pathogenic or benign classification")))))}m.isMDXComponent=!0},9998:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>c,frontMatter:()=>o,metadata:()=>s,toc:()=>p});var a=n(7462),i=(n(7294),n(3905)),r=n(3962);const o={title:"Primate AI-3D"},l=void 0,s={unversionedId:"data-sources/primate-ai",id:"version-3.24/data-sources/primate-ai",title:"Primate AI-3D",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/primate-ai.mdx",sourceDirName:"data-sources",slug:"/data-sources/primate-ai",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/primate-ai.mdx",tags:[],version:"3.24",frontMatter:{title:"Primate AI-3D"},sidebar:"docs",previous:{title:"PhyloP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop"},next:{title:"REVEL",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel"}},p=[{value:"Overview",id:"overview",children:[],level:2},{value:"Parsing",id:"parsing",children:[{value:"TSV File",id:"tsv-file",children:[],level:3},{value:"Pre-processing",id:"pre-processing",children:[],level:3},{value:"SA Generation",id:"sa-generation",children:[],level:3},{value:"Known Issues",id:"known-issues",children:[],level:3},{value:"Download URL",id:"download-url",children:[],level:3}],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],d={toc:p},m="wrapper";function c(e){let{components:t,...n}=e;return(0,i.kt)(m,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations."),(0,i.kt)("p",null,"The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information.\nThe model's innovative use of primate sequencing and structural data offers promising insights into variant interpretation and disease gene identification.\nThe predictive score range between 0 and 1, with 0 being benign and 1 being most pathogenic."),(0,i.kt)("p",null,"For more details, refer to these publications:"),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("ol",{parentName:"div"},(0,i.kt)("li",{parentName:"ol"},"Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates. ",(0,i.kt)("em",{parentName:"li"},"Science")," ",(0,i.kt)("strong",{parentName:"li"},"380"),", eabn8153 (2023). ",(0,i.kt)("a",{parentName:"li",href:"https://doi.org/10.1126/science.abn8197"},"https://doi.org/10.1126/science.abn8197")),(0,i.kt)("li",{parentName:"ol"},"Sundaram, L., Gao, H., Padigepati, S.R. et al. Predicting the clinical impact of human mutation with deep neural networks. ",(0,i.kt)("em",{parentName:"li"},"Nat Genet")," ",(0,i.kt)("strong",{parentName:"li"},"50"),", 1161\u20131170 (2018). ",(0,i.kt)("a",{parentName:"li",href:"https://doi.org/10.1038/s41588-018-0167-z"},"https://doi.org/10.1038/s41588-018-0167-z"))))),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Professional data source")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"This is a Professional data source and is not available freely. Please contact ",(0,i.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com")," if you would like to obtain it."))),(0,i.kt)("h2",{id:"parsing"},"Parsing"),(0,i.kt)("h3",{id:"tsv-file"},"TSV File"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"chr pos non_flipped_ref non_flipped_alt gene_name change_position_1based ref_aa alt_aa score_PAI3D percentile_PAI3D refseq prediction\nchr1 69094 G A ENST00000335137.4 2 V M 0.6169436463713646 0.5200308441794135 NM_001005484.1 pathogenic\nchr1 69094 G C ENST00000335137.4 2 V L 0.5557043975591658 0.4271457250214688 NM_001005484.1 benign\nchr1 69094 G T ENST00000335137.4 2 V L 0.5557043975591658 0.4271457391722522 NM_001005484.1 benign\nchr1 69095 T A ENST00000335137.4 2 V E 0.8063537482917307 0.8032228720356267 NM_001005484.1 pathogenic\nchr1 69095 T C ENST00000335137.4 2 V A 0.5795628190040587 0.4631329075815453 NM_001005484.1 benign\nchr1 69095 T G ENST00000335137.4 2 V G 0.7922330142557621 0.7834049546930125 NM_001005484.1 pathogenic\n")),(0,i.kt)("p",null,"From the CSV file, all columns are parsed:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"chr")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"pos")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"non_flipped_ref")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"non_flipped_alt")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"gene_name")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"change_position_1based")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"ref_aa")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"alt_aa")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"score_PAI3D")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"percentile_PAI3D")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"refseq")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"prediction"))),(0,i.kt)("p",null,"The fields ",(0,i.kt)("inlineCode",{parentName:"p"},"gene_name")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"refseq")," define the Ensembl and RefSeq transcript IDs respectively.\nThese transcripts are passed as-is and some of them might be unrecognized/deprecated by RefSeq/Ensembl."),(0,i.kt)("div",{className:"admonition admonition-note alert alert--secondary"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"}))),"GRCh37")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Note that for GRCh37, a lifted over file is provided.\nThe file is not sorted, therefore it must first be sorted.\nAlso note that certain RefSeq transcripts appear not to have been mapped during the lift-over process."))),(0,i.kt)("h3",{id:"pre-processing"},"Pre-processing"),(0,i.kt)("p",null,"Sorting"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-shell"},"gzcat PrimateAI-3D.hg19.txt.gz | sort -t $'\\t' -k1,1 -k2,2n | gzip > PrimateAI-3D.hg19_sorted.tsv.gz\n")),(0,i.kt)("h3",{id:"sa-generation"},"SA Generation"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-shell"},'dotnet SAUtils.dll \\\nPrimateAi \\\n--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \\\n--i "${ExternalDataSources}/PrimateAI/3D/PrimateAI-3D.hg38.txt.gz" \\\n--o "${SaUtilsOutput]"\n')),(0,i.kt)("h3",{id:"known-issues"},"Known Issues"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Some transcript IDs defined in the data file are obsolete, retired, or updated.\nThey are not removed or modified by Illumina Connected Annotations, and are passed as-is from the PrimateAI-3D data source."),(0,i.kt)("h4",{parentName:"div",id:"example"},"Example:"),(0,i.kt)("p",{parentName:"div"},(0,i.kt)("strong",{parentName:"p"},"ENST00000643905.1")," transcript is retired according to ",(0,i.kt)("a",{parentName:"p",href:"https://useast.ensembl.org/Homo_sapiens/Transcript/Idhistory?db=core;t=ENST00000643905"},"Ensembl")),(0,i.kt)("p",{parentName:"div"},(0,i.kt)("strong",{parentName:"p"},"NM_182838.2")," transcript is removed because it is a pseudo-gene according to ",(0,i.kt)("a",{parentName:"p",href:"https://www.ncbi.nlm.nih.gov/nuccore/NM_182838.3"},"RefSeq")))),(0,i.kt)("h3",{id:"download-url"},"Download URL"),(0,i.kt)("p",null,(0,i.kt)("a",{parentName:"p",href:"https://primad.basespace.illumina.com/"},"https://primad.basespace.illumina.com/")),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)(r.default,{mdxType:"JSON"}))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/af997954.478221ef.js b/assets/js/af997954.478221ef.js new file mode 100644 index 00000000..7f8ce551 --- /dev/null +++ b/assets/js/af997954.478221ef.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3831],{3905:(e,t,r)=>{r.d(t,{Zo:()=>p,kt:()=>f});var n=r(7294);function a(e,t,r){return t in e?Object.defineProperty(e,t,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[t]=r,e}function o(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function c(e){for(var t=1;t=0||(a[r]=e[r]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(a[r]=e[r])}return a}var i=n.createContext({}),s=function(e){var t=n.useContext(i),r=t;return e&&(r="function"==typeof e?e(t):c(c({},t),e)),r},p=function(e){var t=s(e.components);return n.createElement(i.Provider,{value:t},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return n.createElement(n.Fragment,{},t)}},d=n.forwardRef((function(e,t){var r=e.components,a=e.mdxType,o=e.originalType,i=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),u=s(r),d=a,f=u["".concat(i,".").concat(d)]||u[d]||m[d]||o;return r?n.createElement(f,c(c({ref:t},p),{},{components:r})):n.createElement(f,c({ref:t},p))}));function f(e,t){var r=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=r.length,c=new Array(o);c[0]=d;var l={};for(var i in t)hasOwnProperty.call(t,i)&&(l[i]=t[i]);l.originalType=e,l[u]="string"==typeof e?e:a,c[1]=l;for(var s=2;s{r.r(t),r.d(t,{contentTitle:()=>c,default:()=>u,frontMatter:()=>o,metadata:()=>l,toc:()=>i});var n=r(7462),a=(r(7294),r(3905));const o={},c=void 0,l={unversionedId:"data-sources/revel-json",id:"version-3.24/data-sources/revel-json",title:"revel-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/revel-json.md",sourceDirName:"data-sources",slug:"/data-sources/revel-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/revel-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],s={toc:i},p="wrapper";function u(e){let{components:t,...r}=e;return(0,a.kt)(p,(0,n.Z)({},s,r,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"revel":{ \n "score":0.027\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"score"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1.0")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/b23ebcdf.c05b032d.js b/assets/js/b23ebcdf.c05b032d.js new file mode 100644 index 00000000..33acb349 --- /dev/null +++ b/assets/js/b23ebcdf.c05b032d.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7911],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>g});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function o(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function i(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var s=r.createContext({}),p=function(t){var e=r.useContext(s),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},m=function(t){var e=p(t.components);return r.createElement(s.Provider,{value:e},t.children)},c="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},u=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,o=t.originalType,s=t.parentName,m=l(t,["components","mdxType","originalType","parentName"]),c=p(n),u=a,g=c["".concat(s,".").concat(u)]||c[u]||d[u]||o;return n?r.createElement(g,i(i({ref:e},m),{},{components:n})):r.createElement(g,i({ref:e},m))}));function g(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var o=n.length,i=new Array(o);i[0]=u;var l={};for(var s in e)hasOwnProperty.call(e,s)&&(l[s]=e[s]);l.originalType=t,l[c]="string"==typeof t?t:a,i[1]=l;for(var p=2;p{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>c,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var r=n(7462),a=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/cosmic-gene-fusion-json",id:"version-3.24/data-sources/cosmic-gene-fusion-json",title:"cosmic-gene-fusion-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/cosmic-gene-fusion-json.md",sourceDirName:"data-sources",slug:"/data-sources/cosmic-gene-fusion-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-gene-fusion-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cosmic-gene-fusion-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],p={toc:s},m="wrapper";function c(t){let{components:e,...n}=t;return(0,a.kt)(m,(0,r.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},' "cosmicGeneFusions":[\n {\n "id":"COSF881",\n "numSamples":6,\n "geneSymbols":[\n "MYB",\n "NFIB"\n ],\n "hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",\n "histologies":[\n {\n "name":"adenoid cystic carcinoma",\n "numSamples":6\n }\n ],\n "sites":[\n {\n "name":"salivary gland (submandibular)",\n "numSamples":1\n },\n {\n "name":"salivary gland (parotid)",\n "numSamples":1\n },\n {\n "name":"salivary gland (nasal cavity)",\n "numSamples":1\n },\n {\n "name":"breast",\n "numSamples":3\n }\n ],\n "pubMedIds":[\n 19841262\n ]\n }\n ]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"id"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"COSMIC fusion ID")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"})),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"geneSymbols"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"5' gene & 3' gene")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"hgvsr"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"HGVS RNA translocation fusion notation")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"histologies"),(0,a.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"phenotypic descriptions")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"sites"),(0,a.kt)("td",{parentName:"tr",align:"center"},"count array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"tissue types")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")))),(0,a.kt)("p",null,(0,a.kt)("strong",{parentName:"p"},"Count")),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"name"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"description")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"})))))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/b4506888.2b5dcf08.js b/assets/js/b4506888.2b5dcf08.js new file mode 100644 index 00000000..86826ad8 --- /dev/null +++ b/assets/js/b4506888.2b5dcf08.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4781],{3905:(t,n,e)=>{e.d(n,{Zo:()=>p,kt:()=>k});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var d=a.createContext({}),s=function(t){var n=a.useContext(d),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},p=function(t){var n=s(t.components);return a.createElement(d.Provider,{value:n},t.children)},u="mdxType",c={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},m=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,d=t.parentName,p=o(t,["components","mdxType","originalType","parentName"]),u=s(e),m=r,k=u["".concat(d,".").concat(m)]||u[m]||c[m]||l;return e?a.createElement(k,i(i({ref:n},p),{},{components:e})):a.createElement(k,i({ref:n},p))}));function k(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=m;var o={};for(var d in n)hasOwnProperty.call(n,d)&&(o[d]=n[d]);o.originalType=t,o[u]="string"==typeof t?t:r,i[1]=o;for(var s=2;s{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>o,toc:()=>d});var a=e(7462),r=(e(7294),e(3905));const l={id:"introduction",title:"Introduction",description:"Clinical-grade variant annotation",hide_title:!0,slug:"/"},i=void 0,o={unversionedId:"introduction/introduction",id:"version-3.24/introduction/introduction",title:"Introduction",description:"Clinical-grade variant annotation",source:"@site/versioned_docs/version-3.24/introduction/introduction.mdx",sourceDirName:"introduction",slug:"/",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/introduction/introduction.mdx",tags:[],version:"3.24",frontMatter:{id:"introduction",title:"Introduction",description:"Clinical-grade variant annotation",hide_title:!0,slug:"/"},sidebar:"docs",next:{title:"Licensed Content",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/licensedContent"}},d=[{value:"What does Illumina Connected Annotations annotate?",id:"what-does-illumina-connected-annotations-annotate",children:[],level:2},{value:"Download",id:"download",children:[],level:2}],s={toc:d},p="wrapper";function u(t){let{components:n,...l}=t;return(0,r.kt)(p,(0,a.Z)({},s,l,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(260).Z})),(0,r.kt)("p",null,"Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation."),(0,r.kt)("p",null,"The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease."),(0,r.kt)("p",null,"The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily."),(0,r.kt)("h2",{id:"what-does-illumina-connected-annotations-annotate"},"What does Illumina Connected Annotations annotate?"),(0,r.kt)("p",null,"We use Sequence Ontology consequences to describe how each variant impacts a given transcript:"),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(9858).Z})),(0,r.kt)("p",null,"The transcript and gene models are obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/"},"RefSeq")," and ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ensembl.org/pub/"},"Ensembl"),".\nThe current officially supported versions are:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Version"),(0,r.kt)("th",{parentName:"tr",align:null},"Release Date"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"RefSeq"),(0,r.kt)("td",{parentName:"tr",align:null},"GCF_000001405.40-RS_2023_03"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-03-21")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl"),(0,r.kt)("td",{parentName:"tr",align:null},"110"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-04-27")))),(0,r.kt)("p",null,"In addition, it uses external data sources to provide additional context for each variant.\nIllumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic.\nThe basic tier can be accessed free of charge. The professional tier requires a license. Please see ",(0,r.kt)("a",{parentName:"p",href:"./introduction/licensedContent"},"Licensed Content")," for details. For access, please contact ",(0,r.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Availability"),(0,r.kt)("th",{parentName:"tr",align:null},"Latest Supported Version"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"COSMIC"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"99")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"OMIM"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Primate AI-3D"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Splice AI"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.3")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"1000 Genomes Project"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"Phase 3 v3plus")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Cancer Hotspots"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"2017")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinGen"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinVar"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20231230")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DANN"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"dbSNP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"156")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DECIPHER"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"201509")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"FusionCatcher"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"1.33")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GERP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20110522")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GME Variome"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20160618")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gnomAD"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"3.1.2")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MITOMAP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200819")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MultiZ 100 way"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20171006")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"REVEL"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"freeze 5")))),(0,r.kt)("h2",{id:"download"},"Download"),(0,r.kt)("p",null,"Please visit ",(0,r.kt)("a",{parentName:"p",href:"https://developer.illumina.com/illumina-connected-annotations"},"Illumina Connected Annotations"),"."))}u.isMDXComponent=!0},260:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/ICAnnotations-966475fab8adae0519d1667d592ad4b2.png"},9858:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/TranscriptConsequences-60ca1c43a36dacf896fecdabf09ce02c.svg"}}]); \ No newline at end of file diff --git a/assets/js/b5aea075.46242714.js b/assets/js/b5aea075.46242714.js new file mode 100644 index 00000000..9aa4233f --- /dev/null +++ b/assets/js/b5aea075.46242714.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[815,3593],{3905:(t,e,a)=>{a.d(e,{Zo:()=>d,kt:()=>g});var n=a(7294);function r(t,e,a){return e in t?Object.defineProperty(t,e,{value:a,enumerable:!0,configurable:!0,writable:!0}):t[e]=a,t}function l(t,e){var a=Object.keys(t);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(t);e&&(n=n.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),a.push.apply(a,n)}return a}function i(t){for(var e=1;e=0||(r[a]=t[a]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(t,a)&&(r[a]=t[a])}return r}var p=n.createContext({}),m=function(t){var e=n.useContext(p),a=e;return t&&(a="function"==typeof t?t(e):i(i({},e),t)),a},d=function(t){var e=m(t.components);return n.createElement(p.Provider,{value:e},t.children)},s="mdxType",c={inlineCode:"code",wrapper:function(t){var e=t.children;return n.createElement(n.Fragment,{},e)}},N=n.forwardRef((function(t,e){var a=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,d=o(t,["components","mdxType","originalType","parentName"]),s=m(a),N=r,g=s["".concat(p,".").concat(N)]||s[N]||c[N]||l;return a?n.createElement(g,i(i({ref:e},d),{},{components:a})):n.createElement(g,i({ref:e},d))}));function g(t,e){var a=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=a.length,i=new Array(l);i[0]=N;var o={};for(var p in e)hasOwnProperty.call(e,p)&&(o[p]=e[p]);o.originalType=t,o[s]="string"==typeof t?t:r,i[1]=o;for(var m=2;m{a.r(e),a.d(e,{contentTitle:()=>i,default:()=>s,frontMatter:()=>l,metadata:()=>o,toc:()=>p});var n=a(7462),r=(a(7294),a(3905));const l={},i=void 0,o={unversionedId:"data-sources/fusioncatcher-json",id:"version-3.24/data-sources/fusioncatcher-json",title:"fusioncatcher-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/fusioncatcher-json.md",sourceDirName:"data-sources",slug:"/data-sources/fusioncatcher-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/fusioncatcher-json.md",tags:[],version:"3.24",frontMatter:{}},p=[{value:"genes",id:"genes",children:[],level:4},{value:"gene",id:"gene",children:[],level:4}],m={toc:p},d="wrapper";function s(t){let{components:e,...a}=t;return(0,r.kt)(d,(0,n.Z)({},m,a,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},' "fusionCatcher":[\n {\n "genes":{\n "first":{\n "hgnc":"ETV6",\n "isOncogene":true\n },\n "second":{\n "hgnc":"RUNX1"\n },\n "isParalogPair":true,\n "isPseudogenePair":true,\n "isReadthrough":true\n },\n "germlineSources":[\n "1000 Genomes Project"\n ],\n "somaticSources":[\n "COSMIC",\n "TCGA oesophageal carcinomas"\n ]\n }\n ]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"genes"),(0,r.kt)("td",{parentName:"tr",align:"center"},"genes object"),(0,r.kt)("td",{parentName:"tr",align:"left"},"5' gene & 3' gene")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"germlineSources"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"matches in known germline data sources")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"somaticSources"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"matches in known somatic data sources")))),(0,r.kt)("h4",{id:"genes"},"genes"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"first"),(0,r.kt)("td",{parentName:"tr",align:"center"},"gene object"),(0,r.kt)("td",{parentName:"tr",align:"left"},"5' gene")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"second"),(0,r.kt)("td",{parentName:"tr",align:"center"},"gene object"),(0,r.kt)("td",{parentName:"tr",align:"left"},"3' gene")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isParalogPair"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when both genes are paralogs for each other")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isPseudogenePair"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when both genes are pseudogenes for each other")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isReadthrough"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)")))),(0,r.kt)("h4",{id:"gene"},"gene"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isOncogene"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when this gene is an oncogene")))))}s.isMDXComponent=!0},1066:(t,e,a)=>{a.r(e),a.d(e,{contentTitle:()=>o,default:()=>c,frontMatter:()=>i,metadata:()=>p,toc:()=>m});var n=a(7462),r=(a(7294),a(3905)),l=a(5794);const i={title:"FusionCatcher"},o=void 0,p={unversionedId:"data-sources/fusioncatcher",id:"version-3.24/data-sources/fusioncatcher",title:"FusionCatcher",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/fusioncatcher.mdx",sourceDirName:"data-sources",slug:"/data-sources/fusioncatcher",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/fusioncatcher.mdx",tags:[],version:"3.24",frontMatter:{title:"FusionCatcher"},sidebar:"docs",previous:{title:"DECIPHER",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher"},next:{title:"GERP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp"}},m=[{value:"Overview",id:"overview",children:[],level:2},{value:"Supported Data Sources",id:"supported-data-sources",children:[{value:"Oncogenes",id:"oncogenes",children:[],level:3},{value:"Germline",id:"germline",children:[],level:3},{value:"Somatic",id:"somatic",children:[],level:3}],level:2},{value:"Gene Pair TSV File",id:"gene-pair-tsv-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"Gene TSV File",id:"gene-tsv-file",children:[{value:"Example",id:"example-1",children:[],level:3},{value:"Parsing",id:"parsing-1",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],d={toc:m},s="wrapper";function c(t){let{components:e,...a}=t;return(0,r.kt)(s,(0,n.Z)({},d,a,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://github.com/ndaniel/fusioncatcher"},"FusionCatcher")," is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Daniel Nicorici, Mihaela \u015eatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murum\xe4gi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) ",(0,r.kt)("a",{parentName:"p",href:"https://www.biorxiv.org/content/10.1101/011650v1"},"FusionCatcher \u2013 a tool for finding somatic fusion genes in paired-end RNA-sequencing data"),". ",(0,r.kt)("em",{parentName:"p"},"bioRxiv")," 011650"))),(0,r.kt)("h2",{id:"supported-data-sources"},"Supported Data Sources"),(0,r.kt)("h3",{id:"oncogenes"},"Oncogenes"),(0,r.kt)("p",null,"The following data sources are aggregated and used to populate the ",(0,r.kt)("inlineCode",{parentName:"p"},"isOncogene")," field in the gene JSON object:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Description"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Reference"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Data"),(0,r.kt)("th",{parentName:"tr",align:"left"},"FusionCatcher filename"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Bushman"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"http://www.bushmanlab.org/links/genelists"},"bushmanlab.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"cancer_genes.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ONGENE"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.sciencedirect.com/science/article/pii/S1673852716302053"},"JGG")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"http://ongene.bioinfo-minzhao.org"},"bioinfo-minzhao.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"oncogenes_more.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"UniProt tumor genes"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/49/D1/D480/6006196"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.uniprot.org/downloads"},"uniprot.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"tumor_genes.txt")))),(0,r.kt)("h3",{id:"germline"},"Germline"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Illumina Connected Annotations label"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Reference"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Data"),(0,r.kt)("th",{parentName:"tr",align:"left"},"FusionCatcher filename"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"1000 Genomes Project"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0104567"},"PLOS ONE")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"1000genomes.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Healthy (strong support)"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"banned.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Illumina Body Map 2.0"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-513"},"EBI")),(0,r.kt)("td",{parentName:"tr",align:"left"},"bodymap2.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"CACG"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.sciencedirect.com/science/article/pii/S0888754312000821"},"Genomics")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"cacg.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ConjoinG"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0013284"},"PLOS ONE")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"conjoing.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Healthy prefrontal cortex"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-016-0164-y"},"BMC Medical Genomics")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68719"},"NCBI GEO")),(0,r.kt)("td",{parentName:"tr",align:"left"},"cortex.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Duplicated Genes Database"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0050653"},"PLOS ONE")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"http://dgd.genouest.org/"},"genouest.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"dgd.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"GTEx healthy tissues"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://gtexportal.org/home/"},"gtexportal.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"gtex.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Healthy"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"healthy.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Human Protein Atlas"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.mcponline.org/article/S1535-9476(20)34633-8/fulltext"},"MCP")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1733/"},"EBI")),(0,r.kt)("td",{parentName:"tr",align:"left"},"hpa.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Babiceanu non-cancer tissues"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/44/6/2859/2499453"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/44/6/2859/2499453#supplementary-data"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-cancer_tissues.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"non-tumor cell lines"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"non-tumor_cells.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TumorFusions normal"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/46/D1/D1144/4584571"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/46/D1/D1144/4584571#supplementary-data"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},"tcga-normal.txt")))),(0,r.kt)("h3",{id:"somatic"},"Somatic"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Illumina Connected Annotations label"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Reference"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Data"),(0,r.kt)("th",{parentName:"tr",align:"left"},"FusionCatcher filename"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Alaei-Mahabadi 18 cancers"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.pnas.org/content/113/48/13768.long"},"PNAS")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"18cancers.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"DepMap CCLE"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://depmap.org/portal/download/"},"depmap.org")),(0,r.kt)("td",{parentName:"tr",align:"left"},"ccle.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"CCLE Klijn"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.nature.com/articles/nbt.3080"},"Nature Biotechnology")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.nature.com/articles/nbt.3080#Sec27"},"Nature Biotechnology")),(0,r.kt)("td",{parentName:"tr",align:"left"},"ccle2.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"CCLE Vellichirammal"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.cell.com/molecular-therapy-family/nucleic-acids/fulltext/S2162-2531(20)30058-5"},"Molecular Therapy Nucleic Acids")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"ccle3.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Cancer Genome Project"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://cancer.sanger.ac.uk/cosmic/download"},"COSMIC")),(0,r.kt)("td",{parentName:"tr",align:"left"},"cgp.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ChimerKB 4.0"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/48/D1/D817/5611671"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.kobic.re.kr/chimerdb_mirror/download"},"kobic.re.kr")),(0,r.kt)("td",{parentName:"tr",align:"left"},"chimerdb4kb.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ChimerPub 4.0"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/48/D1/D817/5611671"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.kobic.re.kr/chimerdb_mirror/download"},"kobic.re.kr")),(0,r.kt)("td",{parentName:"tr",align:"left"},"chimerdb4pub.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"ChimerSeq 4.0"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/48/D1/D817/5611671"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.kobic.re.kr/chimerdb_mirror/download"},"kobic.re.kr")),(0,r.kt)("td",{parentName:"tr",align:"left"},"chimerdb4seq.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"COSMIC"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/47/D1/D941/5146192"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://cancer.sanger.ac.uk/cosmic/download"},"COSMIC")),(0,r.kt)("td",{parentName:"tr",align:"left"},"cosmic.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Bao gliomas"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://genome.cshlp.org/content/24/11/1765"},"Genome Research")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"gliomas.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Known"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"known.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Mitelman DB"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://mitelmandatabase.isb-cgc.org"},"ISB-CGC")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://storage.cloud.google.com/mitelman-data-files/prod/mitelman_db.zip"},"Google Cloud")),(0,r.kt)("td",{parentName:"tr",align:"left"},"mitelman.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TCGA oesophageal carcinomas"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.nature.com/articles/nature20805"},"Nature")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"oesophagus.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Bailey pancreatic cancers"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.nature.com/articles/nature16965"},"Nature")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.nature.com/articles/nature16965#Sec44"},"Nature")),(0,r.kt)("td",{parentName:"tr",align:"left"},"pancreases.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"PCAWG"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://doi.org/10.1016/j.cell.2018.03.042"},"Cell")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://dcc.icgc.org/releases/PCAWG/transcriptome/fusion"},"ICGC")),(0,r.kt)("td",{parentName:"tr",align:"left"},"pcawg.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"Robinson prostate cancers"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://doi.org/10.1016/j.cell.2015.05.001"},"Cell")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.cell.com/cell/fulltext/S0092-8674(15)00548-6?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867415005486%3Fshowall%3Dtrue#supplementaryMaterial"},"Cell")),(0,r.kt)("td",{parentName:"tr",align:"left"},"prostate_cancer.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TCGA"),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga"},"cancer.gov")),(0,r.kt)("td",{parentName:"tr",align:"left"},"tcga.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TumorFusions tumor"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/46/D1/D1144/4584571"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://academic.oup.com/nar/article/46/D1/D1144/4584571#supplementary-data"},"NAR")),(0,r.kt)("td",{parentName:"tr",align:"left"},"tcga-cancer.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TCGA Gao"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://doi.org/10.1016/j.celrep.2018.03.050"},"Cell")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.cell.com/cell-reports/fulltext/S2211-1247(18)30395-4?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2211124718303954%3Fshowall%3Dtrue#supplementaryMaterial"},"Cell")),(0,r.kt)("td",{parentName:"tr",align:"left"},"tcga2.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TCGA Vellichirammal"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://www.cell.com/molecular-therapy-family/nucleic-acids/fulltext/S2162-2531(20)30058-5"},"Molecular Therapy Nucleic Acids")),(0,r.kt)("td",{parentName:"tr",align:"left"}),(0,r.kt)("td",{parentName:"tr",align:"left"},"tcga3.txt")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"TICdb"),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-8-33"},"BMC Genomics")),(0,r.kt)("td",{parentName:"tr",align:"left"},(0,r.kt)("a",{parentName:"td",href:"https://genetica.unav.edu/TICdb/allseqs_TICdb.txt"},"unav.edu")),(0,r.kt)("td",{parentName:"tr",align:"left"},"ticdb.txt")))),(0,r.kt)("h2",{id:"gene-pair-tsv-file"},"Gene Pair TSV File"),(0,r.kt)("p",null,"Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together."),(0,r.kt)("h3",{id:"example"},"Example"),(0,r.kt)("p",null,"Here are the first few lines of the 1000genomes.txt file:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre"},"ENSG00000006210 ENSG00000102962\nENSG00000006652 ENSG00000181016\nENSG00000014138 ENSG00000149798\nENSG00000026297 ENSG00000071242\nENSG00000035499 ENSG00000155959\nENSG00000055211 ENSG00000131013\nENSG00000055332 ENSG00000179915\nENSG00000062485 ENSG00000257727\nENSG00000065978 ENSG00000166501\nENSG00000066044 ENSG00000104980\n")),(0,r.kt)("h3",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files."),(0,r.kt)("h2",{id:"gene-tsv-file"},"Gene TSV File"),(0,r.kt)("p",null,"Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources."),(0,r.kt)("h3",{id:"example-1"},"Example"),(0,r.kt)("p",null,"Here are the first few lines of the oncogenes_more.txt file:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre"},"ENSG00000000938\nENSG00000003402\nENSG00000005469\nENSG00000005884\nENSG00000006128\nENSG00000006453\nENSG00000006468\nENSG00000007350\nENSG00000008294\nENSG00000008952\n")),(0,r.kt)("h3",{id:"parsing-1"},"Parsing"),(0,r.kt)("h2",{id:"known-issues"},"Known Issues"),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"FusionCatcher also uses creates custom Ensembl genes (e.g. ",(0,r.kt)("inlineCode",{parentName:"p"},"ENSG09000000002"),") to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations."),(0,r.kt)("p",{parentName:"div"},"I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future."))),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://sourceforge.net/projects/fusioncatcher/files/data"},"https://sourceforge.net/projects/fusioncatcher/files/data")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(l.default,{mdxType:"JSON"}))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/b7dbb0d7.1351c5ab.js b/assets/js/b7dbb0d7.1351c5ab.js new file mode 100644 index 00000000..00d45b27 --- /dev/null +++ b/assets/js/b7dbb0d7.1351c5ab.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7392],{3905:(t,e,n)=>{n.d(e,{Zo:()=>c,kt:()=>k});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function i(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function o(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var p=r.createContext({}),d=function(t){var e=r.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},c=function(t){var e=d(t.components);return r.createElement(p.Provider,{value:e},t.children)},m="mdxType",s={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},u=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,i=t.originalType,p=t.parentName,c=l(t,["components","mdxType","originalType","parentName"]),m=d(n),u=a,k=m["".concat(p,".").concat(u)]||m[u]||s[u]||i;return n?r.createElement(k,o(o({ref:e},c),{},{components:n})):r.createElement(k,o({ref:e},c))}));function k(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var i=n.length,o=new Array(i);o[0]=u;var l={};for(var p in e)hasOwnProperty.call(e,p)&&(l[p]=e[p]);l.originalType=t,l[m]="string"==typeof t?t:a,o[1]=l;for(var d=2;d{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>m,frontMatter:()=>i,metadata:()=>l,toc:()=>p});var r=n(7462),a=(n(7294),n(3905));const i={title:"Dependencies"},o=void 0,l={unversionedId:"introduction/dependencies",id:"version-3.24/introduction/dependencies",title:"Dependencies",description:"All of the following dependencies have been included in this repository.",source:"@site/versioned_docs/version-3.24/introduction/dependencies.md",sourceDirName:"introduction",slug:"/introduction/dependencies",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/dependencies",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/introduction/dependencies.md",tags:[],version:"3.24",frontMatter:{title:"Dependencies"},sidebar:"docs",previous:{title:"Licensed Content",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/licensedContent"},next:{title:"Getting Started",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/getting-started"}},p=[],d={toc:p},c="wrapper";function m(t){let{components:e,...n}=t;return(0,a.kt)(c,(0,r.Z)({},d,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("p",null,"All of the following dependencies have been included in this repository."),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Name"),(0,a.kt)("th",{parentName:"tr",align:"center"},"License"),(0,a.kt)("th",{parentName:"tr",align:null},"Usage"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/aws/aws-extensions-for-dotnet-cli"},"Amazon.Lambda")),(0,a.kt)("td",{parentName:"tr",align:"center"},"Apache"),(0,a.kt)("td",{parentName:"tr",align:null},"AWS extensions for .NET CLI")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/aws/aws-sdk-net/"},"AWSSDK")),(0,a.kt)("td",{parentName:"tr",align:"center"},"Apache"),(0,a.kt)("td",{parentName:"tr",align:null},"AWS Lambda, S3, SNS support")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://www.newtonsoft.com/json"},"Json.NET")),(0,a.kt)("td",{parentName:"tr",align:"center"},"MIT"),(0,a.kt)("td",{parentName:"tr",align:null},"JASIX utility")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/ebiggers/libdeflate"},"libdeflate")),(0,a.kt)("td",{parentName:"tr",align:"center"},"MIT"),(0,a.kt)("td",{parentName:"tr",align:null},"BlockCompression library")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/moq/moq4"},"Moq")),(0,a.kt)("td",{parentName:"tr",align:"center"},"BSD"),(0,a.kt)("td",{parentName:"tr",align:null},"Mocking framework for unit tests")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"http://www.ndesk.org/Options"},"NDesk.Options")),(0,a.kt)("td",{parentName:"tr",align:"center"},"MIT/X11"),(0,a.kt)("td",{parentName:"tr",align:null},"CommandLine library")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/xunit/xunit"},"xUnit")),(0,a.kt)("td",{parentName:"tr",align:"center"},"Apache"),(0,a.kt)("td",{parentName:"tr",align:null},"Unit testing framework")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/Dead2/zlib-ng"},"zlib-ng")),(0,a.kt)("td",{parentName:"tr",align:"center"},"zlib"),(0,a.kt)("td",{parentName:"tr",align:null},"BlockCompression library")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},(0,a.kt)("a",{parentName:"td",href:"https://github.com/facebook/zstd"},"zstd")),(0,a.kt)("td",{parentName:"tr",align:"center"},"BSD"),(0,a.kt)("td",{parentName:"tr",align:null},"BlockCompression library")))))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/bd285e40.c8b8cb68.js b/assets/js/bd285e40.c8b8cb68.js new file mode 100644 index 00000000..eb296fbf --- /dev/null +++ b/assets/js/bd285e40.c8b8cb68.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[800],{3905:(e,n,t)=>{t.d(n,{Zo:()=>c,kt:()=>h});var a=t(7294);function i(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function o(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,a)}return t}function l(e){for(var n=1;n=0||(i[t]=e[t]);return i}(e,n);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(i[t]=e[t])}return i}var s=a.createContext({}),d=function(e){var n=a.useContext(s),t=n;return e&&(t="function"==typeof e?e(n):l(l({},n),e)),t},c=function(e){var n=d(e.components);return a.createElement(s.Provider,{value:n},e.children)},p="mdxType",m={inlineCode:"code",wrapper:function(e){var n=e.children;return a.createElement(a.Fragment,{},n)}},u=a.forwardRef((function(e,n){var t=e.components,i=e.mdxType,o=e.originalType,s=e.parentName,c=r(e,["components","mdxType","originalType","parentName"]),p=d(t),u=i,h=p["".concat(s,".").concat(u)]||p[u]||m[u]||o;return t?a.createElement(h,l(l({ref:n},c),{},{components:t})):a.createElement(h,l({ref:n},c))}));function h(e,n){var t=arguments,i=n&&n.mdxType;if("string"==typeof e||i){var o=t.length,l=new Array(o);l[0]=u;var r={};for(var s in n)hasOwnProperty.call(n,s)&&(r[s]=n[s]);r.originalType=e,r[p]="string"==typeof e?e:i,l[1]=r;for(var d=2;d{t.r(n),t.d(n,{contentTitle:()=>l,default:()=>p,frontMatter:()=>o,metadata:()=>r,toc:()=>s});var a=t(7462),i=(t(7294),t(3905));const o={title:"Getting Started"},l=void 0,r={unversionedId:"introduction/getting-started",id:"version-3.24/introduction/getting-started",title:"Getting Started",description:"Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.",source:"@site/versioned_docs/version-3.24/introduction/getting-started.md",sourceDirName:"introduction",slug:"/introduction/getting-started",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/getting-started",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/introduction/getting-started.md",tags:[],version:"3.24",frontMatter:{title:"Getting Started"},sidebar:"docs",previous:{title:"Dependencies",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/dependencies"},next:{title:"Parsing Illumina Connected Annotations JSON",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/parsing-json"}},s=[{value:"Getting Illumina Connected Annotations",id:"getting-illumina-connected-annotations",children:[{value:"Latest Release",id:"latest-release",children:[],level:3},{value:"Quick Start",id:"quick-start",children:[],level:3},{value:"Docker",id:"docker",children:[],level:3}],level:2},{value:"Downloading the data files",id:"downloading-the-data-files",children:[{value:"Preserving old data file",id:"preserving-old-data-file",children:[],level:3}],level:2},{value:"Download a test VCF file",id:"download-a-test-vcf-file",children:[],level:2},{value:"Running Illumina Connected Annotations",id:"running-illumina-connected-annotations",children:[],level:2},{value:"The Illumina Connected Annotations command line",id:"the-illumina-connected-annotations-command-line",children:[{value:"Specifying annotation sources",id:"specifying-annotation-sources",children:[],level:3}],level:2}],d={toc:s},c="wrapper";function p(e){let{components:n,...o}=e;return(0,i.kt)(c,(0,a.Z)({},d,o,{components:n,mdxType:"MDXLayout"}),(0,i.kt)("p",null,"Illumina Connected Annotations is written in C# using ",(0,i.kt)("a",{parentName:"p",href:"https://www.microsoft.com/net/download/core"},".NET Core")," (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files."),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the ",(0,i.kt)("a",{parentName:"p",href:"https://www.microsoft.com/net/download/core"},".NET Core downloads")," page."))),(0,i.kt)("h2",{id:"getting-illumina-connected-annotations"},"Getting Illumina Connected Annotations"),(0,i.kt)("h3",{id:"latest-release"},"Latest Release"),(0,i.kt)("p",null,"Please visit ",(0,i.kt)("a",{parentName:"p",href:"https://developer.illumina.com/illumina-connected-annotations"},"Illumina Connected Annotations"),". to obtain the latest release."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"mkdir -p IlluminaConnectedAnnotations/Data\ncd IlluminaConnectedAnnotations\nunzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip\n")),(0,i.kt)("h3",{id:"quick-start"},"Quick Start"),(0,i.kt)("p",null,"If you want to get started right away, we've created ",(0,i.kt)("a",{target:"_blank",href:t(9897).Z},"a script")," that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip\n")),(0,i.kt)("p",null,"We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X."),(0,i.kt)("h3",{id:"docker"},"Docker"),(0,i.kt)("p",null,"Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz\n")),(0,i.kt)("p",null,"If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the ",(0,i.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/Dockerfile"},"Dockerfile")," and ",(0,i.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/create_docker_image.sh"},"this script"),"."),(0,i.kt)("p",null,"Put both files (",(0,i.kt)("inlineCode",{parentName:"p"},"create_docker_image.sh")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"Dockerfile"),") inside the same folder."),(0,i.kt)("p",null,"In terminal, execute command below inside the folder where you put those scripts:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"chmod +x create_docker_image.sh\n./create_docker_image.sh [path to zip file] [image tag]\n")),(0,i.kt)("p",null,"After you run the script, the docker image will be available in your local machine with image name ",(0,i.kt)("inlineCode",{parentName:"p"},"illumina-connected-annotations:[image tag specified]"),"."),(0,i.kt)("p",null,"For Docker, we have special instructions for running the Downloader:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch\n")),(0,i.kt)("p",null,"Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's ",(0,i.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz"},"a toy VCF")," in case you need it):"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \\\n -r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \\\n --sd /scratch/SupplementaryAnnotation/GRCh37 \\\n -i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq\n")),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"caution")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker."))),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"For convenience, the user is encouraged to create aliases for the docker commands. For example:"),(0,i.kt)("pre",{parentName:"div"},(0,i.kt)("code",{parentName:"pre",className:"language-bash"},'alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"\n')))),(0,i.kt)("h2",{id:"downloading-the-data-files"},"Downloading the data files"),(0,i.kt)("p",null,"To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet bin/Release/net6.0/Downloader.dll \\\n --ga GRCh37 \\\n -o Data\n")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"--ga")," argument specifies the genome assembly which can be ",(0,i.kt)("inlineCode",{parentName:"li"},"GRCh37"),", ",(0,i.kt)("inlineCode",{parentName:"li"},"GRCh38"),", or ",(0,i.kt)("inlineCode",{parentName:"li"},"both"),"."),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-o")," argument specifies the output directory")),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Glitches in the Matrix")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked ",(0,i.kt)("inlineCode",{parentName:"p"},"truncated"),", try fixing the root cause and running the downloader again."))),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed."))),(0,i.kt)("h3",{id:"preserving-old-data-file"},"Preserving old data file"),(0,i.kt)("p",null,"By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your ",(0,i.kt)("inlineCode",{parentName:"p"},"SupplementaryAnnotation")," folder contained ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231101.nsa")," and the latest available version is ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231203.nsa"),", next time the Downloader is run, ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231101.nsa")," will be replaced with ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231203.nsa"),". "),(0,i.kt)("p",null,"Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the ",(0,i.kt)("inlineCode",{parentName:"p"},"SupplementaryAnnotation")," folder. In other words, the ",(0,i.kt)("inlineCode",{parentName:"p"},"SupplementaryAnnotation")," folder cannot contain both ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231101.nsa")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"ClinVar_20231203.nsa"),"."),(0,i.kt)("p",null,"Here is an example of how to proceed if a user doesn't want the latest version of ClinVar."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"ls Data/SupplementaryAnnotation/GRCh38\n...\nClinGen_disease_validity_curations_20231011.nga\nClinVar_20230930.nsa\nClinVar_20230930.nsa.idx\n...\nmv Data/SupplementaryAnnotation/GRCh38/ClinVar* /GRCh38/\n\ndotnet bin/Release/net6.0/Downloader.dll \\\n --ga GRCh38 \\\n -o Data\n\nrm Data/SupplementaryAnnotation/GRCh38/ClinVar*\nmv /GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/\n")),(0,i.kt)("h2",{id:"download-a-test-vcf-file"},"Download a test VCF file"),(0,i.kt)("p",null,"Here's ",(0,i.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz"},"a toy VCF file")," you can play around with:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz\n")),(0,i.kt)("h2",{id:"running-illumina-connected-annotations"},"Running Illumina Connected Annotations"),(0,i.kt)("p",null,"Once you have downloaded the data sets, use the following command to annotate your VCF:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet Annotator.dll \\\n -c Data/Cache \\\n --sd Data/SupplementaryAnnotation/GRCh37 \\\n -r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \\\n -i HiSeq.10000.vcf.gz \\\n -o HiSeq.10000\n")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-c")," argument specifies the cache directory"),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"--sd")," argument specifies the supplementary annotation directory"),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-r")," argument specifies the compressed reference path"),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-i")," argument specifies the input VCF path"),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-o")," argument specifies the output filename prefix")),(0,i.kt)("p",null,"When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"---------------------------------------------------------------------------\nIllumina Connected Annotations (c) 2023 Illumina, Inc.\n 3.22.0\n---------------------------------------------------------------------------\n\nInitialization Time Positions/s\n---------------------------------------------------------------------------\nCache 00:00:00.0\nSA Position Scan 00:00:00.0 153,634\n\nReference Preload Annotation Variants/s\n---------------------------------------------------------------------------\nchr1 00:00:00.2 00:00:00.8 11,873\n\nSummary Time Percent\n---------------------------------------------------------------------------\nInitialization 00:00:00.0 1.5 %\nPreload 00:00:00.2 4.9 %\nAnnotation 00:00:00.8 18.5 %\n\nTime: 00:00:04.4\n")),(0,i.kt)("p",null,"The output will be a JSON file called ",(0,i.kt)("inlineCode",{parentName:"p"},"HiSeq.10000.json.gz"),". Here's ",(0,i.kt)("a",{parentName:"p",href:"https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.json.gz"},"the full JSON file"),"."),(0,i.kt)("h2",{id:"the-illumina-connected-annotations-command-line"},"The Illumina Connected Annotations command line"),(0,i.kt)("p",null,"The full command line options can be viewed by using the ",(0,i.kt)("inlineCode",{parentName:"p"},"-h")," option or no options"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet Annotator.dll\n---------------------------------------------------------------------------\nIllumina Connected Annotations (c) 2024 Illumina, Inc.\n 3.24.0\n---------------------------------------------------------------------------\n\nUSAGE: dotnet Nirvana.dll -i -c --sd -r -o \nAnnotates a set of variants\n\nOPTIONS:\n --cache, -c \n input cache directory\n --in, -i input VCF path\n --tsv input VCF path\n --out, -o output file path\n --ref, -r input compressed reference sequence path\n --sd input supplementary annotation directory\n --sources, -s annotation data sources to be used (comma\n separated list of supported tags)\n --credentialsFile \n File path to user credentials, default is set to ~\n /.ilmnAnnotations/credentials.json\n --ignoreLicenseError ignore error due to invalid license and skip\n related data sources\n --force-mt forces to annotate mitochondrial variants\n --legacy-vids enables support for legacy VIDs\n --enable-dq report DQ from VCF samples field\n --enable-bidirectional-fusions\n enables support for bidirectional gene fusions\n --disable-junction-preservation\n disable junction preserving functional annotation\n --str user provided STR annotation TSV file\n --vcf-info additional vcf info field keys (comma separated)\n desired in the output\n --vcf-sample-info \n additional vcf format field keys (comma separated)\n desired in the output\n --sa-cutoff Any SV larger than or equal to this value will\n not have any supplementary annotations\n --output-format \n output file format, available options: json, vcf.\n --help, -h displays the help menu\n --version, -v displays the version\n\n##### Supported Annotation Sources #####\nBasic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way\n\nProfessional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.\n\n##### Contact #####\nProfessional content licensing, feedback and technical support: annotation_support@illumina.com.\n")),(0,i.kt)("h3",{id:"specifying-annotation-sources"},"Specifying annotation sources"),(0,i.kt)("p",null,"By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the ",(0,i.kt)("inlineCode",{parentName:"p"},"--sources|-s")," option. If an unknown source is specified, a warning message will be printed."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet Annotator.dll \\\n -c Data/Cache/GRCh37 \\\n --sd Data/SupplementaryAnnotation/GRCh37 \\\n -r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \\\n -i HiSeq.10000.vcf.gz \\\n -o HiSeq.10000 \\\n -s omim,gnomad,ense\n ---------------------------------------------------------------------------\n Illumina Connected Annotations (c) 2023 Illumina, Inc.\n 3.22.0\n ---------------------------------------------------------------------------\n\n WARNING: Unknown tag in data-sources: ense.\n Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,\n mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,\n gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,\n heteroplasmy,Ensembl,RefSeq\n\n Initialization Time Positions/s\n ---------------------------------------------------------------------------\n SA Position Scan 00:00:00.3 307,966\n ....\n ..\n")),(0,i.kt)("p",null,"The list of available values is compiled from the files provided (using ",(0,i.kt)("inlineCode",{parentName:"p"},"-c")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"--sd")," options)."))}p.isMDXComponent=!0},9897:(e,n,t)=>{t.d(n,{Z:()=>a});const a=t.p+"assets/files/TestIlluminaConnectedAnnotations-f9628aa5a9463c140128003e34b450f8.sh"}}]); \ No newline at end of file diff --git a/assets/js/bee64c31.fb97cfb1.js b/assets/js/bee64c31.fb97cfb1.js new file mode 100644 index 00000000..1345e08a --- /dev/null +++ b/assets/js/bee64c31.fb97cfb1.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3555],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function l(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var i=r.createContext({}),s=function(e){var t=r.useContext(i),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},u=function(e){var t=s(e.components);return r.createElement(i.Provider,{value:t},e.children)},p="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},m=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,i=e.parentName,u=c(e,["components","mdxType","originalType","parentName"]),p=s(n),m=a,f=p["".concat(i,".").concat(m)]||p[m]||d[m]||o;return n?r.createElement(f,l(l({ref:t},u),{},{components:n})):r.createElement(f,l({ref:t},u))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,l=new Array(o);l[0]=m;var c={};for(var i in t)hasOwnProperty.call(t,i)&&(c[i]=t[i]);c.originalType=e,c[p]="string"==typeof e?e:a,l[1]=c;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>p,frontMatter:()=>o,metadata:()=>c,toc:()=>i});var r=n(7462),a=(n(7294),n(3905));const o={},l=void 0,c={unversionedId:"data-sources/dann-json",id:"version-3.24/data-sources/dann-json",title:"dann-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/dann-json.md",sourceDirName:"data-sources",slug:"/data-sources/dann-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/dann-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],s={toc:i},u="wrapper";function p(e){let{components:t,...n}=e;return(0,a.kt)(u,(0,r.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"dannScore": 0.27\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"dannScore"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1.0")))))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/bf58d54d.0cf2023b.js b/assets/js/bf58d54d.0cf2023b.js new file mode 100644 index 00000000..922af366 --- /dev/null +++ b/assets/js/bf58d54d.0cf2023b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5341],{3905:(t,n,e)=>{e.d(n,{Zo:()=>d,kt:()=>N});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var m=a.createContext({}),o=function(t){var n=a.useContext(m),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},d=function(t){var n=o(t.components);return a.createElement(m.Provider,{value:n},t.children)},u="mdxType",g={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},k=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,m=t.parentName,d=p(t,["components","mdxType","originalType","parentName"]),u=o(e),k=r,N=u["".concat(m,".").concat(k)]||u[k]||g[k]||l;return e?a.createElement(N,i(i({ref:n},d),{},{components:e})):a.createElement(N,i({ref:n},d))}));function N(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=k;var p={};for(var m in n)hasOwnProperty.call(n,m)&&(p[m]=n[m]);p.originalType=t,p[u]="string"==typeof t?t:r,i[1]=p;for(var o=2;o{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>p,toc:()=>m});var a=e(7462),r=(e(7294),e(3905));const l={title:"Transcript Consequence Impact"},i=void 0,p={unversionedId:"core-functionality/transcript-consequence-impacts",id:"version-3.24/core-functionality/transcript-consequence-impacts",title:"Transcript Consequence Impact",description:"Overview",source:"@site/versioned_docs/version-3.24/core-functionality/transcript-consequence-impacts.md",sourceDirName:"core-functionality",slug:"/core-functionality/transcript-consequence-impacts",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impacts",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/transcript-consequence-impacts.md",tags:[],version:"3.24",frontMatter:{title:"Transcript Consequence Impact"},sidebar:"docs",previous:{title:"Junction Preserving Annotation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/junction-preserving"},next:{title:"Variant IDs",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/variant-ids"}},m=[{value:"Overview",id:"overview",children:[],level:2},{value:"Sources",id:"sources",children:[],level:2},{value:"Consequence Impacts",id:"consequence-impacts",children:[{value:"Known Issues",id:"known-issues",children:[],level:3}],level:2},{value:"Example Transcript",id:"example-transcript",children:[],level:2}],o={toc:m},d="wrapper";function u(t){let{components:n,...e}=t;return(0,r.kt)(d,(0,a.Z)({},o,e,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Illumina Connected Annotations provides transcript consequence impacts from ",(0,r.kt)("a",{parentName:"p",href:"https://pcingola.github.io/SnpEff"},"SnpEff"),"."),(0,r.kt)("p",null,"Following definitions are used for the impact ratings as obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://github.com/pcingola/SnpEff/blob/master/src/docs/se_inputoutput.md#impact-prediction"},"SnpEff"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Definition"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"A non-disruptive variant that might change protein effectiveness.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Assumed to be mostly harmless or unlikely to change protein behavior.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.")))),(0,r.kt)("h2",{id:"sources"},"Sources"),(0,r.kt)("p",null,"Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP."),(0,r.kt)("ol",null,(0,r.kt)("li",{parentName:"ol"},"SnpEff ",(0,r.kt)("a",{parentName:"li",href:"https://pcingola.github.io/SnpEff/se_inputoutput/"},"Documentation")," and ",(0,r.kt)("a",{parentName:"li",href:"https://github.com/pcingola/SnpEff/blob/001b947893b616e3af082e6c565e253eef59db98/src/main/java/org/snpeff/snpEffect/EffectType.java#L54"},"Codebase")),(0,r.kt)("li",{parentName:"ol"},"VEP ",(0,r.kt)("a",{parentName:"li",href:"https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html"},"Documentation"))),(0,r.kt)("h2",{id:"consequence-impacts"},"Consequence Impacts"),(0,r.kt)("p",null,"Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Consequence"),(0,r.kt)("th",{parentName:"tr",align:null},"SnpEff Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"VEP Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Illumina Connected Annotations Impact"),(0,r.kt)("th",{parentName:"tr",align:null},"Comment"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"bidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"coding_sequence_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low, modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on CDS")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_decrease"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_increase"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"downstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_elongation"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"feature_truncation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"five_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"frameshift_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"incomplete_terminal_codon_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_deletion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"inframe_insertion"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"intron_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"mature_miRNA_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"missense_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"NMD_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_exon_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"non_coding_transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"protein_altering_variant"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_ablation"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"regulatory_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_change"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_contraction"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"short_tandem_repeat_expansion"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_acceptor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_donor_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"splice_region_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"moderate, low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"Based on SPLICE_SITE_REGION in SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"start_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_gained"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_lost"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"stop_retained_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"synonymous_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"low"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_duplicated_transcript"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"three_prime_UTR_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_ablation"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_amplification"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"VEP")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"transcript_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"unidirectional_gene_fusion"),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null}),(0,r.kt)("td",{parentName:"tr",align:null},"high"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"upstream_gene_variant"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"modifier"),(0,r.kt)("td",{parentName:"tr",align:null},"SnpEff + VEP")))),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Note: ")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ol",{parentName:"div"},(0,r.kt)("li",{parentName:"ol"},"For transcripts with multiple consequences, the most severe impact rating is chosen."),(0,r.kt)("li",{parentName:"ol"},"In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides ",(0,r.kt)("inlineCode",{parentName:"li"},"modifier"),".")))),(0,r.kt)("h3",{id:"known-issues"},"Known Issues"),(0,r.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Known Issues")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"The consequence ",(0,r.kt)("inlineCode",{parentName:"p"},"splice_polypyrimidine_tract_variant"),", is rated as ",(0,r.kt)("inlineCode",{parentName:"p"},"low")," by VEP.\nHowever, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided."))),(0,r.kt)("h2",{id:"example-transcript"},"Example Transcript"),(0,r.kt)("p",null,"The key ",(0,r.kt)("inlineCode",{parentName:"p"},"impact")," for each transcript gives the impact rating for the ",(0,r.kt)("inlineCode",{parentName:"p"},"consequence"),"."),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json",metastring:"{20-24}","{20-24}":!0},'{\n "variants": [\n {\n "vid": "1-1623412-T-C",\n "chromosome": "1",\n "begin": 1623412,\n "end": 1623412,\n "refAllele": "T",\n "altAllele": "C",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.1623412T>C",\n "transcripts": [\n {\n "transcript": "ENST00000479659.5",\n "source": "Ensembl",\n "bioType": "lncRNA",\n "introns": "2/18",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "intron_variant",\n "non_coding_transcript_variant"\n ],\n "impact": "modifier",\n "hgvsc": "ENST00000479659.5:n.288-19T>C"\n },\n {\n "transcript": "ENST00000489635.5",\n "source": "VEP",\n "bioType": "mRNA",\n "codons": "aTg/aCg",\n "aminoAcids": "M/T",\n "cdnaPos": "269",\n "cdsPos": "134",\n "exons": "3/20",\n "proteinPos": "45",\n "geneId": "ENSG00000197530",\n "hgnc": "MIB2",\n "consequence": [\n "missense_variant"\n ],\n "impact": "moderate",\n "hgvsc": "ENST00000489635.5:c.134T>C",\n "hgvsp": "ENSP00000426007.1:p.(Met45Thr)",\n "proteinId": "ENSP00000426007.1"\n }\n ]\n }\n ]\n}\n')))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/c2a95928.3454d69f.js b/assets/js/c2a95928.3454d69f.js new file mode 100644 index 00000000..202090ce --- /dev/null +++ b/assets/js/c2a95928.3454d69f.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9949],{3905:(n,e,t)=>{t.d(e,{Zo:()=>p,kt:()=>m});var o=t(7294);function i(n,e,t){return e in n?Object.defineProperty(n,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):n[e]=t,n}function r(n,e){var t=Object.keys(n);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(n);e&&(o=o.filter((function(e){return Object.getOwnPropertyDescriptor(n,e).enumerable}))),t.push.apply(t,o)}return t}function a(n){for(var e=1;e=0||(i[t]=n[t]);return i}(n,e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(n);for(o=0;o=0||Object.prototype.propertyIsEnumerable.call(n,t)&&(i[t]=n[t])}return i}var c=o.createContext({}),l=function(n){var e=o.useContext(c),t=e;return n&&(t="function"==typeof n?n(e):a(a({},e),n)),t},p=function(n){var e=l(n.components);return o.createElement(c.Provider,{value:e},n.children)},d="mdxType",g={inlineCode:"code",wrapper:function(n){var e=n.children;return o.createElement(o.Fragment,{},e)}},u=o.forwardRef((function(n,e){var t=n.components,i=n.mdxType,r=n.originalType,c=n.parentName,p=s(n,["components","mdxType","originalType","parentName"]),d=l(t),u=i,m=d["".concat(c,".").concat(u)]||d[u]||g[u]||r;return t?o.createElement(m,a(a({ref:e},p),{},{components:t})):o.createElement(m,a({ref:e},p))}));function m(n,e){var t=arguments,i=e&&e.mdxType;if("string"==typeof n||i){var r=t.length,a=new Array(r);a[0]=u;var s={};for(var c in e)hasOwnProperty.call(e,c)&&(s[c]=e[c]);s.originalType=n,s[d]="string"==typeof n?n:i,a[1]=s;for(var l=2;l{t.r(e),t.d(e,{contentTitle:()=>a,default:()=>d,frontMatter:()=>r,metadata:()=>s,toc:()=>c});var o=t(7462),i=(t(7294),t(3905));const r={title:"Parsing Illumina Connected Annotations JSON"},a=void 0,s={unversionedId:"introduction/parsing-json",id:"version-3.24/introduction/parsing-json",title:"Parsing Illumina Connected Annotations JSON",description:"Parsing JSON",source:"@site/versioned_docs/version-3.24/introduction/parsing-json.md",sourceDirName:"introduction",slug:"/introduction/parsing-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/parsing-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/introduction/parsing-json.md",tags:[],version:"3.24",frontMatter:{title:"Parsing Illumina Connected Annotations JSON"},sidebar:"docs",previous:{title:"Getting Started",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/getting-started"},next:{title:"1000 Genomes",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes"}},c=[{value:"Parsing JSON",id:"parsing-json",children:[{value:"Organization",id:"organization",children:[],level:3},{value:"JASIX",id:"jasix",children:[],level:3}],level:2}],l={toc:c},p="wrapper";function d(n){let{components:e,...r}=n;return(0,i.kt)(p,(0,o.Z)({},l,r,{components:e,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"parsing-json"},"Parsing JSON"),(0,i.kt)("p",null,"Our JSON files are organized similarly to original VCF variants:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:t(3431).Z})),(0,i.kt)("p",null,"Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once."),(0,i.kt)("p",null,"To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently."),(0,i.kt)("h3",{id:"organization"},"Organization"),(0,i.kt)("p",null,"Our JSON file is arranged as follows:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"the header section is located on the first line"),(0,i.kt)("li",{parentName:"ul"},"each line after that corresponds to a position (same as a row in a VCF file)",(0,i.kt)("ul",{parentName:"li"},(0,i.kt)("li",{parentName:"ul"},"until you reach the genes section ",(0,i.kt)("inlineCode",{parentName:"li"},'],"genes":[')))),(0,i.kt)("li",{parentName:"ul"},"each line after that corresponds to a gene",(0,i.kt)("ul",{parentName:"li"},(0,i.kt)("li",{parentName:"ul"},"until you reach the end ",(0,i.kt)("inlineCode",{parentName:"li"},"]}"))))),(0,i.kt)("p",null,"Knowing this, you can load each position line as an independent JSON object and extract the information you need. "),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Jupyter Notebook")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"To demonstrate this, we have put together a ",(0,i.kt)("a",{parentName:"p",href:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/blob/master/static/files/parse-json-python.ipynb"},"Jupyter notebook demonstrating how to do this in Python")," and a ",(0,i.kt)("a",{parentName:"p",href:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/blob/master/static/files/parse-json-r.ipynb"},"R version")," as well."))),(0,i.kt)("h3",{id:"jasix"},"JASIX"),(0,i.kt)("p",null,"One of the tools that we really like in the VCF ecosystem is ",(0,i.kt)("a",{parentName:"p",href:"https://dx.doi.org/10.1093%2Fbioinformatics%2Fbtq671"},"tabix"),". Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX."),(0,i.kt)("p",null,"Here's an example of how you might use JASIX:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-bash"},"dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455\n")),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-i")," argument specifies the Illumina Connected Annotations JSON path"),(0,i.kt)("li",{parentName:"ul"},"the ",(0,i.kt)("inlineCode",{parentName:"li"},"-q")," argument specifies a genomic range ",(0,i.kt)("em",{parentName:"li"},"(you can use as many of these as you want)"))),(0,i.kt)("p",null,"JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section)."),(0,i.kt)("p",null,"The output from JASIX is compliant JSON object shown in pretty-printed form:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{"positions":[\n{\n "chromosome": "chr1",\n "position": 942451,\n "refAllele": "T",\n "altAlleles": [\n "C"\n ],\n "quality": 484.23,\n "filters": [\n "PASS"\n ],\n "cytogeneticBand": "1p36.33",\n "samples": [\n {\n "genotype": "1/1",\n "variantFrequencies": [\n 1\n ],\n "totalDepth": 21,\n "genotypeQuality": 60,\n "alleleDepths": [\n 0,\n 21\n ]\n },\n {\n "genotype": "1/1",\n "variantFrequencies": [\n 1\n ],\n "totalDepth": 32,\n "genotypeQuality": 93,\n "alleleDepths": [\n 0,\n 32\n ]\n },\n {\n "genotype": "1/1",\n "variantFrequencies": [\n 1\n ],\n "totalDepth": 36,\n "genotypeQuality": 105,\n "alleleDepths": [\n 0,\n 36\n ]\n }\n ],\n "variants": [\n {\n "vid": "1-942451-T-C",\n "chromosome": "chr1",\n "begin": 942451,\n "end": 942451,\n "refAllele": "T",\n "altAllele": "C",\n "variantType": "SNV",\n "hgvsg": "NC_000001.11:g.942451T>C",\n "phylopScore": -0.1,\n "clinvar": [\n {\n "id": "VCV000836156.1",\n "reviewStatus": "criteria provided, single submitter",\n "significance": [\n "uncertain significance"\n ],\n "refAllele": "T",\n "altAllele": "T",\n "lastUpdatedDate": "2020-08-20"\n },\n {\n "id": "RCV001037211.1",\n "variationId": 836156,\n "reviewStatus": "criteria provided, single submitter",\n "alleleOrigins": [\n "germline"\n ],\n "refAllele": "T",\n "altAllele": "T",\n "phenotypes": [\n "not provided"\n ],\n "medGenIds": [\n "CN517202"\n ],\n "significance": [\n "uncertain significance"\n ],\n "lastUpdatedDate": "2020-08-20",\n "pubMedIds": [\n "28492532"\n ]\n }\n ],\n "dbsnp": [\n "rs6672356"\n ],\n "gnomad": {\n "coverage": 25,\n "allAf": 0.999855,\n "allAn": 123742,\n "allAc": 123724,\n "allHc": 61853,\n "afrAf": 0.999416,\n "afrAn": 10278,\n "afrAc": 10272,\n "afrHc": 5133,\n "amrAf": 0.99995,\n "amrAn": 20008,\n "amrAc": 20007,\n "amrHc": 10003,\n "easAf": 1,\n "easAn": 6054,\n "easAc": 6054,\n "easHc": 3027,\n "finAf": 1,\n "finAn": 8696,\n "finAc": 8696,\n "finHc": 4348,\n "nfeAf": 0.999899,\n "nfeAn": 49590,\n "nfeAc": 49585,\n "nfeHc": 24790,\n "asjAf": 1,\n "asjAn": 7208,\n "asjAc": 7208,\n "asjHc": 3604,\n "sasAf": 0.99967,\n "sasAn": 18160,\n "sasAc": 18154,\n "sasHc": 9074,\n "othAf": 1,\n "othAn": 3748,\n "othAc": 3748,\n "othHc": 1874,\n "maleAf": 0.9999,\n "maleAn": 69780,\n "maleAc": 69773,\n "maleHc": 34883,\n "femaleAf": 0.999796,\n "femaleAn": 53962,\n "femaleAc": 53951,\n "femaleHc": 26970,\n "controlsAllAf": 0.999815,\n "controlsAllAn": 48654,\n "controlsAllAc": 48645\n },\n "oneKg": {\n "allAf": 1,\n "afrAf": 1,\n "amrAf": 1,\n "easAf": 1,\n "eurAf": 1,\n "sasAf": 1,\n "allAn": 5008,\n "afrAn": 1322,\n "amrAn": 694,\n "easAn": 1008,\n "eurAn": 1006,\n "sasAn": 978,\n "allAc": 5008,\n "afrAc": 1322,\n "amrAc": 694,\n "easAc": 1008,\n "eurAc": 1006,\n "sasAc": 978\n },\n "primateAI": [\n {\n "hgnc": "SAMD11",\n "scorePercentile": 0.87\n }\n ],\n "revel": {\n "score": 0.145\n },\n "topmed": {\n "allAf": 0.999809,\n "allAn": 125568,\n "allAc": 125544,\n "allHc": 62760\n },\n "transcripts": [\n {\n "transcript": "ENST00000420190.6",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "downstream_gene_variant"\n ],\n "proteinId": "ENSP00000411579.2"\n },\n {\n "transcript": "ENST00000342066.7",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "1110",\n "cdsPos": "1027",\n "exons": "10/14",\n "proteinPos": "343",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000342066.7:c.1027T>C",\n "hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000342313.3",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000618181.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "732",\n "cdsPos": "652",\n "exons": "7/11",\n "proteinPos": "218",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000618181.4:c.652T>C",\n "hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000480870.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000622503.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "1110",\n "cdsPos": "1030",\n "exons": "10/14",\n "proteinPos": "344",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000622503.4:c.1030T>C",\n "hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",\n "isCanonical": true,\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000482138.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000618323.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "cTg/cCg",\n "aminoAcids": "L/P",\n "cdnaPos": "712",\n "cdsPos": "632",\n "exons": "8/12",\n "proteinPos": "211",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000618323.4:c.632T>C",\n "hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "unknown",\n "proteinId": "ENSP00000480678.1",\n "siftScore": 0.03,\n "siftPrediction": "deleterious - low confidence"\n },\n {\n "transcript": "ENST00000616016.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "ccT/ccC",\n "aminoAcids": "P",\n "cdnaPos": "944",\n "cdsPos": "864",\n "exons": "9/13",\n "proteinPos": "288",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "synonymous_variant"\n ],\n "hgvsc": "ENST00000616016.4:c.864T>C",\n "hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",\n "proteinId": "ENSP00000478421.1"\n },\n {\n "transcript": "ENST00000618779.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "921",\n "cdsPos": "841",\n "exons": "9/13",\n "proteinPos": "281",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000618779.4:c.841T>C",\n "hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000484256.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000616125.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "783",\n "cdsPos": "703",\n "exons": "8/12",\n "proteinPos": "235",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000616125.4:c.703T>C",\n "hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000484643.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000620200.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "cTg/cCg",\n "aminoAcids": "L/P",\n "cdnaPos": "427",\n "cdsPos": "347",\n "exons": "5/9",\n "proteinPos": "116",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000620200.4:c.347T>C",\n "hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "unknown",\n "proteinId": "ENSP00000484820.1",\n "siftScore": 0.16,\n "siftPrediction": "tolerated - low confidence"\n },\n {\n "transcript": "ENST00000617307.4",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "867",\n "cdsPos": "787",\n "exons": "9/13",\n "proteinPos": "263",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000617307.4:c.787T>C",\n "hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000482090.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "NM_152486.2",\n "source": "RefSeq",\n "bioType": "protein_coding",\n "codons": "Cgg/Cgg",\n "aminoAcids": "R",\n "cdnaPos": "1107",\n "cdsPos": "1027",\n "exons": "10/14",\n "proteinPos": "343",\n "geneId": "148398",\n "hgnc": "SAMD11",\n "consequence": [\n "synonymous_variant"\n ],\n "hgvsc": "NM_152486.2:c.1027T>C",\n "hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",\n "isCanonical": true,\n "proteinId": "NP_689699.2"\n },\n {\n "transcript": "ENST00000341065.8",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "750",\n "cdsPos": "751",\n "exons": "8/12",\n "proteinPos": "251",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000341065.8:c.750T>C",\n "hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000349216.4",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000455979.1",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "codons": "Tgg/Cgg",\n "aminoAcids": "W/R",\n "cdnaPos": "507",\n "cdsPos": "508",\n "exons": "4/7",\n "proteinPos": "170",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "missense_variant"\n ],\n "hgvsc": "ENST00000455979.1:c.507T>C",\n "hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",\n "polyPhenScore": 0,\n "polyPhenPrediction": "benign",\n "proteinId": "ENSP00000412228.1",\n "siftScore": 1,\n "siftPrediction": "tolerated"\n },\n {\n "transcript": "ENST00000478729.1",\n "source": "Ensembl",\n "bioType": "processed_transcript",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "downstream_gene_variant"\n ]\n },\n {\n "transcript": "ENST00000474461.1",\n "source": "Ensembl",\n "bioType": "retained_intron",\n "cdnaPos": "389",\n "exons": "3/4",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "non_coding_transcript_exon_variant"\n ],\n "hgvsc": "ENST00000474461.1:n.389T>C"\n },\n {\n "transcript": "ENST00000466827.1",\n "source": "Ensembl",\n "bioType": "retained_intron",\n "cdnaPos": "191",\n "exons": "2/2",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "non_coding_transcript_exon_variant"\n ],\n "hgvsc": "ENST00000466827.1:n.191T>C"\n },\n {\n "transcript": "ENST00000464948.1",\n "source": "Ensembl",\n "bioType": "retained_intron",\n "cdnaPos": "286",\n "exons": "1/2",\n "geneId": "ENSG00000187634",\n "hgnc": "SAMD11",\n "consequence": [\n "non_coding_transcript_exon_variant"\n ],\n "hgvsc": "ENST00000464948.1:n.286T>C"\n },\n {\n "transcript": "NM_015658.3",\n "source": "RefSeq",\n "bioType": "protein_coding",\n "geneId": "26155",\n "hgnc": "NOC2L",\n "consequence": [\n "downstream_gene_variant"\n ],\n "isCanonical": true,\n "proteinId": "NP_056473.2"\n },\n {\n "transcript": "ENST00000483767.5",\n "source": "Ensembl",\n "bioType": "retained_intron",\n "geneId": "ENSG00000188976",\n "hgnc": "NOC2L",\n "consequence": [\n "downstream_gene_variant"\n ]\n },\n {\n "transcript": "ENST00000327044.6",\n "source": "Ensembl",\n "bioType": "protein_coding",\n "geneId": "ENSG00000188976",\n "hgnc": "NOC2L",\n "consequence": [\n "downstream_gene_variant"\n ],\n "isCanonical": true,\n "proteinId": "ENSP00000317992.6"\n },\n {\n "transcript": "ENST00000477976.5",\n "source": "Ensembl",\n "bioType": "retained_intron",\n "geneId": "ENSG00000188976",\n "hgnc": "NOC2L",\n "consequence": [\n "downstream_gene_variant"\n ]\n },\n {\n "transcript": "ENST00000496938.1",\n "source": "Ensembl",\n "bioType": "processed_transcript",\n "geneId": "ENSG00000188976",\n "hgnc": "NOC2L",\n "consequence": [\n "downstream_gene_variant"\n ]\n }\n ]\n }\n ]\n}\n]}\n')))}d.isMDXComponent=!0},3431:(n,e,t)=>{t.d(e,{Z:()=>o});const o=t.p+"assets/images/JSON-Layout-fc8e5c0cf4c8428981cd206fe9b6feac.svg"}}]); \ No newline at end of file diff --git a/assets/js/c4215fd2.0524e83e.js b/assets/js/c4215fd2.0524e83e.js new file mode 100644 index 00000000..99d77585 --- /dev/null +++ b/assets/js/c4215fd2.0524e83e.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3593],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>u});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function l(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var c=r.createContext({}),s=function(e){var t=r.useContext(c),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},p=function(e){var t=s(e.components);return r.createElement(c.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},g=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,o=e.originalType,c=e.parentName,p=i(e,["components","mdxType","originalType","parentName"]),d=s(n),g=a,u=d["".concat(c,".").concat(g)]||d[g]||m[g]||o;return n?r.createElement(u,l(l({ref:t},p),{},{components:n})):r.createElement(u,l({ref:t},p))}));function u(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var o=n.length,l=new Array(o);l[0]=g;var i={};for(var c in t)hasOwnProperty.call(t,c)&&(i[c]=t[c]);i.originalType=e,i[d]="string"==typeof e?e:a,l[1]=i;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>d,frontMatter:()=>o,metadata:()=>i,toc:()=>c});var r=n(7462),a=(n(7294),n(3905));const o={},l=void 0,i={unversionedId:"data-sources/fusioncatcher-json",id:"version-3.24/data-sources/fusioncatcher-json",title:"fusioncatcher-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/fusioncatcher-json.md",sourceDirName:"data-sources",slug:"/data-sources/fusioncatcher-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/fusioncatcher-json.md",tags:[],version:"3.24",frontMatter:{}},c=[{value:"genes",id:"genes",children:[],level:4},{value:"gene",id:"gene",children:[],level:4}],s={toc:c},p="wrapper";function d(e){let{components:t,...n}=e;return(0,a.kt)(p,(0,r.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},' "fusionCatcher":[\n {\n "genes":{\n "first":{\n "hgnc":"ETV6",\n "isOncogene":true\n },\n "second":{\n "hgnc":"RUNX1"\n },\n "isParalogPair":true,\n "isPseudogenePair":true,\n "isReadthrough":true\n },\n "germlineSources":[\n "1000 Genomes Project"\n ],\n "somaticSources":[\n "COSMIC",\n "TCGA oesophageal carcinomas"\n ]\n }\n ]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"genes"),(0,a.kt)("td",{parentName:"tr",align:"center"},"genes object"),(0,a.kt)("td",{parentName:"tr",align:"left"},"5' gene & 3' gene")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"germlineSources"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"matches in known germline data sources")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"somaticSources"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,a.kt)("td",{parentName:"tr",align:"left"},"matches in known somatic data sources")))),(0,a.kt)("h4",{id:"genes"},"genes"),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"first"),(0,a.kt)("td",{parentName:"tr",align:"center"},"gene object"),(0,a.kt)("td",{parentName:"tr",align:"left"},"5' gene")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"second"),(0,a.kt)("td",{parentName:"tr",align:"center"},"gene object"),(0,a.kt)("td",{parentName:"tr",align:"left"},"3' gene")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"isParalogPair"),(0,a.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,a.kt)("td",{parentName:"tr",align:"left"},"true when both genes are paralogs for each other")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"isPseudogenePair"),(0,a.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,a.kt)("td",{parentName:"tr",align:"left"},"true when both genes are pseudogenes for each other")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"isReadthrough"),(0,a.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,a.kt)("td",{parentName:"tr",align:"left"},"true when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)")))),(0,a.kt)("h4",{id:"gene"},"gene"),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"isOncogene"),(0,a.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,a.kt)("td",{parentName:"tr",align:"left"},"true when this gene is an oncogene")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/cc461efb.5c4d5431.js b/assets/js/cc461efb.5c4d5431.js new file mode 100644 index 00000000..b8eb9f93 --- /dev/null +++ b/assets/js/cc461efb.5c4d5431.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[6536],{3905:(a,e,t)=>{t.d(e,{Zo:()=>d,kt:()=>A});var n=t(7294);function i(a,e,t){return e in a?Object.defineProperty(a,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):a[e]=t,a}function c(a,e){var t=Object.keys(a);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(a);e&&(n=n.filter((function(e){return Object.getOwnPropertyDescriptor(a,e).enumerable}))),t.push.apply(t,n)}return t}function r(a){for(var e=1;e=0||(i[t]=a[t]);return i}(a,e);if(Object.getOwnPropertySymbols){var c=Object.getOwnPropertySymbols(a);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(a,t)&&(i[t]=a[t])}return i}var s=n.createContext({}),l=function(a){var e=n.useContext(s),t=e;return a&&(t="function"==typeof a?a(e):r(r({},e),a)),t},d=function(a){var e=l(a.components);return n.createElement(s.Provider,{value:e},a.children)},m="mdxType",u={inlineCode:"code",wrapper:function(a){var e=a.children;return n.createElement(n.Fragment,{},e)}},p=n.forwardRef((function(a,e){var t=a.components,i=a.mdxType,c=a.originalType,s=a.parentName,d=o(a,["components","mdxType","originalType","parentName"]),m=l(t),p=i,A=m["".concat(s,".").concat(p)]||m[p]||u[p]||c;return t?n.createElement(A,r(r({ref:e},d),{},{components:t})):n.createElement(A,r({ref:e},d))}));function A(a,e){var t=arguments,i=e&&e.mdxType;if("string"==typeof a||i){var c=t.length,r=new Array(c);r[0]=p;var o={};for(var s in e)hasOwnProperty.call(e,s)&&(o[s]=e[s]);o.originalType=a,o[m]="string"==typeof a?a:i,r[1]=o;for(var l=2;l{t.r(e),t.d(e,{contentTitle:()=>r,default:()=>m,frontMatter:()=>c,metadata:()=>o,toc:()=>s});var n=t(7462),i=(t(7294),t(3905));const c={title:"Cancer Hotspots"},r=void 0,o={unversionedId:"data-sources/cancer-hotspots",id:"version-3.24/data-sources/cancer-hotspots",title:"Cancer Hotspots",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/cancer-hotspots.mdx",sourceDirName:"data-sources",slug:"/data-sources/cancer-hotspots",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cancer-hotspots",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/cancer-hotspots.mdx",tags:[],version:"3.24",frontMatter:{title:"Cancer Hotspots"},sidebar:"docs",previous:{title:"Amino Acid Conservation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation"},next:{title:"ClinGen",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Data extraction",id:"data-extraction",children:[{value:"Example",id:"example",children:[{value:"SNV",id:"snv",children:[],level:4},{value:"Indel",id:"indel",children:[],level:4}],level:3},{value:"Parsing",id:"parsing",children:[],level:3}],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],l={toc:s},d="wrapper";function m(a){let{components:e,...t}=a;return(0,i.kt)(d,(0,n.Z)({},l,t,{components:e,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279."),(0,i.kt)("p",{parentName:"div"},"Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099."))),(0,i.kt)("h2",{id:"data-extraction"},"Data extraction"),(0,i.kt)("p",null,"Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content."),(0,i.kt)("h3",{id:"example"},"Example"),(0,i.kt)("h4",{id:"snv"},"SNV"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},'Hugo_Symbol Amino_Acid_Position log10_pvalue Mutation_Count Reference_Amino_Acid Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank Variant_Amino_Acid Codon_Change Genomic_Position Detailed_Cancer_Types Organ_Types Tri-nucleotides Mutability mu_protein Total_Samples Analysis_Type qvalue tm qvalue_pancanIs_repeat seq length align100 pad12entropy pad24entropy pad36entropy TP reason n_MSK n_Retro judgement inNBT inOncokb ref qvaluect ct Samples\nNRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1\nNRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1\nNRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1\nNRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1\n')),(0,i.kt)("h4",{id:"indel"},"Indel"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"Hugo_Symbol Amino_Acid_Position log10_pvalue Mutation_Count Reference_Amino_Acid Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank SNP_ID Variant_Amino_Acid Codon_Change Genomic_Position Detailed_Cancer_Types Organ_Types Tri-nucleotides Mutability mu_protein ccf Total_Samples indel_size qvalue tm Is_repeat seq length align100 pad12entropy pad24entropy pad36entropy TP reason n_MSK n_Retro judgement inNBT inOncokb Samples\nSMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1\nCDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1\nCDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1\nCDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1\n")),(0,i.kt)("h3",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"From the file, we're mainly interested in the following columns:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Hugo_Symbol")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Amino_Acid_Position")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Mutation_Count")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Reference_Amino_Acid")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"Variant_Amino_Acid")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"qvalue"))),(0,i.kt)("p",null,"We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from ",(0,i.kt)("inlineCode",{parentName:"p"},"Reference_Amino_Acid")," column.\nThen we match each entry using these notations."),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"caution")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"We currently skip all variants labeled as splice from the source"))),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)("p",null,"The data source will be captured under the cancerHotspots key in the transcript section."),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{13-18}","{13-18}":!0},'{\n "transcript":"NM_002524.5",\n "source":"RefSeq",\n "bioType":"mRNA",\n "aminoAcids":"Q/K",\n "proteinPos":"61",\n "geneId":"4893",\n "hgnc":"NRAS",\n "hgvsc":"NM_002524.5:c.181C>A",\n "hgvsp":"NP_002515.1:p.(Gln61Lys)",\n "isCanonical":true,\n "proteinId":"NP_002515.1",\n "cancerHotspots":{\n "residue":"Q61",\n "numSamples":422,\n "numAltAminoAcidSamples":142,\n "qValue":0\n }\n}\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"residue"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"how many samples are associated with a variant at the same amino acid position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numAltAminoAcidSamples"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"how many samples are associated with a variant with the same position and alternate amino acid position")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"qValue"),(0,i.kt)("td",{parentName:"tr",align:"center"},"double"),(0,i.kt)("td",{parentName:"tr",align:"left"})))))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/d065eee8.f11bbe97.js b/assets/js/d065eee8.f11bbe97.js new file mode 100644 index 00000000..b45029dd --- /dev/null +++ b/assets/js/d065eee8.f11bbe97.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7753],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>v});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},m="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,l=e.originalType,s=e.parentName,p=o(e,["components","mdxType","originalType","parentName"]),m=c(n),u=r,v=m["".concat(s,".").concat(u)]||m[u]||d[u]||l;return n?a.createElement(v,i(i({ref:t},p),{},{components:n})):a.createElement(v,i({ref:t},p))}));function v(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var l=n.length,i=new Array(l);i[0]=u;var o={};for(var s in t)hasOwnProperty.call(t,s)&&(o[s]=t[s]);o.originalType=e,o[m]="string"==typeof e?e:r,i[1]=o;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>m,frontMatter:()=>l,metadata:()=>o,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const l={title:"Variant IDs"},i=void 0,o={unversionedId:"core-functionality/variant-ids",id:"version-3.24/core-functionality/variant-ids",title:"Variant IDs",description:"Overview",source:"@site/versioned_docs/version-3.24/core-functionality/variant-ids.md",sourceDirName:"core-functionality",slug:"/core-functionality/variant-ids",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/variant-ids",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/core-functionality/variant-ids.md",tags:[],version:"3.24",frontMatter:{title:"Variant IDs"},sidebar:"docs",previous:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impacts"},next:{title:"Jasix",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/jasix"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Small Variants",id:"small-variants",children:[{value:"VCF Examples",id:"vcf-examples",children:[],level:3},{value:"Format",id:"format",children:[],level:3},{value:"VID Examples",id:"vid-examples",children:[],level:3}],level:2},{value:"Translocation Breakends",id:"translocation-breakends",children:[{value:"VCF Example",id:"vcf-example",children:[],level:3},{value:"Format",id:"format-1",children:[],level:3},{value:"VID Example",id:"vid-example",children:[],level:3}],level:2},{value:"All Other Structural Variants",id:"all-other-structural-variants",children:[{value:"VCF Examples",id:"vcf-examples-1",children:[],level:3},{value:"Format",id:"format-2",children:[],level:3},{value:"VID Examples",id:"vid-examples-1",children:[],level:3}],level:2}],c={toc:s},p="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},c,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute."),(0,r.kt)("p",null,"The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Conventions")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("ul",{parentName:"div"},(0,r.kt)("li",{parentName:"ul"},"all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)"),(0,r.kt)("li",{parentName:"ul"},"for a reference variant (i.e. no alt allele), replace the period (.) with the reference base"),(0,r.kt)("li",{parentName:"ul"},"padding bases are used, neither the reference nor alternate allele can be empty"),(0,r.kt)("li",{parentName:"ul"},"some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base")))),(0,r.kt)("h2",{id:"small-variants"},"Small Variants"),(0,r.kt)("h3",{id:"vcf-examples"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 66507 . T A 184.45 PASS .\nchr1 66521 . T TATATA 144.53 PASS .\nchr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .\n")),(0,r.kt)("h3",{id:"format"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-examples"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-66507-T-A"),(0,r.kt)("li",{parentName:"ul"},"1-66521-T-TATATA"),(0,r.kt)("li",{parentName:"ul"},"1-66572-GTA-G"),(0,r.kt)("li",{parentName:"ul"},"1-66572-G-GTACTATATATTA")),(0,r.kt)("h2",{id:"translocation-breakends"},"Translocation Breakends"),(0,r.kt)("h3",{id:"vcf-example"},"VCF Example"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 2617277 . A AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[ . PASS SVTYPE=BND\n")),(0,r.kt)("h3",{id:"format-1"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele")),(0,r.kt)("h3",{id:"vid-example"},"VID Example"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[")),(0,r.kt)("h2",{id:"all-other-structural-variants"},"All Other Structural Variants"),(0,r.kt)("h3",{id:"vcf-examples-1"},"VCF Examples"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"chr1 1000 . G . PASS END=3001000;SVTYPE=ROH\nchr1 1350082 . G . PASS END=1351320;SVTYPE=DEL\nchr1 1477854 . C . PASS END=1477984;SVTYPE=DUP\nchr1 1477968 . T . PASS END=1477968;SVTYPE=INS\nchr1 1715898 . N . PASS SVTYPE=CNV;END=1750149\nchr1 2650426 . N . PASS SVTYPE=CNV;END=2653074\nchr2 321682 . T . PASS SVTYPE=INV;END=421681\nchr20 2633403 . G . PASS END=2633421\n")),(0,r.kt)("h3",{id:"format-2"},"Format"),(0,r.kt)("p",null,(0,r.kt)("inlineCode",{parentName:"p"},"chromosome"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"end position"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"reference allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"alternate allele"),"\u2014",(0,r.kt)("inlineCode",{parentName:"p"},"SVTYPE")),(0,r.kt)("h3",{id:"vid-examples-1"},"VID Examples"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"1-1000-3001000-G-","<","ROH",">","-ROH"),(0,r.kt)("li",{parentName:"ul"},"1-1350082-1351320-G-","<","DEL",">","-DEL"),(0,r.kt)("li",{parentName:"ul"},"1-1477854-1477984-C-","<","DUP:TANDEM",">","-DUP"),(0,r.kt)("li",{parentName:"ul"},"1-1477968-1477968-T-","<","INS",">","-INS"),(0,r.kt)("li",{parentName:"ul"},"1-1715898-1750149-A-","<","DUP",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(replace the N with A)")),(0,r.kt)("li",{parentName:"ul"},"1-2650426-2653074-N-","<","DEL",">","-CNV ",(0,r.kt)("strong",{parentName:"li"},"(keep the N)")),(0,r.kt)("li",{parentName:"ul"},"2-321682-421681-T-","<","INV",">","-INV"),(0,r.kt)("li",{parentName:"ul"},"20-2633403-2633421-G-","<","STR2",">","-STR")))}m.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/d284d299.f12d4533.js b/assets/js/d284d299.f12d4533.js new file mode 100644 index 00000000..85926d1b --- /dev/null +++ b/assets/js/d284d299.f12d4533.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[388],{3905:(e,n,t)=>{t.d(n,{Zo:()=>c,kt:()=>h});var a=t(7294);function o(e,n,t){return n in e?Object.defineProperty(e,n,{value:t,enumerable:!0,configurable:!0,writable:!0}):e[n]=t,e}function l(e,n){var t=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(e,n).enumerable}))),t.push.apply(t,a)}return t}function r(e){for(var n=1;n=0||(o[t]=e[t]);return o}(e,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(o[t]=e[t])}return o}var s=a.createContext({}),d=function(e){var n=a.useContext(s),t=n;return e&&(t="function"==typeof e?e(n):r(r({},n),e)),t},c=function(e){var n=d(e.components);return a.createElement(s.Provider,{value:n},e.children)},u="mdxType",p={inlineCode:"code",wrapper:function(e){var n=e.children;return a.createElement(a.Fragment,{},n)}},m=a.forwardRef((function(e,n){var t=e.components,o=e.mdxType,l=e.originalType,s=e.parentName,c=i(e,["components","mdxType","originalType","parentName"]),u=d(t),m=o,h=u["".concat(s,".").concat(m)]||u[m]||p[m]||l;return t?a.createElement(h,r(r({ref:n},c),{},{components:t})):a.createElement(h,r({ref:n},c))}));function h(e,n){var t=arguments,o=n&&n.mdxType;if("string"==typeof e||o){var l=t.length,r=new Array(l);r[0]=m;var i={};for(var s in n)hasOwnProperty.call(n,s)&&(i[s]=n[s]);i.originalType=e,i[u]="string"==typeof e?e:o,r[1]=i;for(var d=2;d{t.r(n),t.d(n,{contentTitle:()=>r,default:()=>u,frontMatter:()=>l,metadata:()=>i,toc:()=>s});var a=t(7462),o=(t(7294),t(3905));const l={title:"SAUtils"},r=void 0,i={unversionedId:"utilities/sautils",id:"version-3.24/utilities/sautils",title:"SAUtils",description:"Overview",source:"@site/versioned_docs/version-3.24/utilities/sautils.mdx",sourceDirName:"utilities",slug:"/utilities/sautils",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/sautils",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/utilities/sautils.mdx",tags:[],version:"3.24",frontMatter:{title:"SAUtils"},sidebar:"docs",previous:{title:"Jasix",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/jasix"},next:{title:"Annotation Engine vs Data update",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/frequently-asked-questions/Annotator-vs-data-update"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"The SAUtils Menu",id:"the-sautils-menu",children:[],level:2},{value:"Output File Formats",id:"output-file-formats",children:[],level:2},{value:"SAUtils AutoDownloadGenerate",id:"sautils-autodownloadgenerate",children:[{value:"AutoDownloadGenerate ClinVar",id:"autodownloadgenerate-clinvar",children:[],level:3},{value:"AutoDownloadGenerate ClinGen",id:"autodownloadgenerate-clingen",children:[],level:3},{value:"AutoDownloadGenerate dbSNP",id:"autodownloadgenerate-dbsnp",children:[],level:3},{value:"AutoDownloadGenerate OMIM",id:"autodownloadgenerate-omim",children:[],level:3},{value:"AutoDownloadGenerate COSMIC",id:"autodownloadgenerate-cosmic",children:[],level:3}],level:2}],d={toc:s},c="wrapper";function u(e){let{components:n,...t}=e;return(0,o.kt)(c,(0,a.Z)({},d,t,{components:n,mdxType:"MDXLayout"}),(0,o.kt)("h2",{id:"overview"},"Overview"),(0,o.kt)("p",null,"SAUtils is a utility tool that creates binary supplementary annotation files (",(0,o.kt)("em",{parentName:"p"},".nsa, "),".gsa, ",(0,o.kt)("em",{parentName:"p"},".npd, "),".nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output."),(0,o.kt)("h2",{id:"the-sautils-menu"},"The SAUtils Menu"),(0,o.kt)("p",null,"SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands."),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\n 3.22.0\n---------------------------------------------------------------------------\n\nUtilities focused on supplementary annotation\n\nUSAGE: dotnet SAUtils.dll [options]\n\nCOMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen\n AaCon create AA conservation database\n ancestralAllele create Ancestral allele database from 1000Genomes data\n ClinGen create ClinGen database\n Downloader download ClinGen database\n clinvar create ClinVar database\n concat merge multiple NSA files for the same data source having non-overlapping regions\n Cosmic create COSMIC database\n CosmicSv create COSMIC SV database\n CosmicFusion create COSMIC gene fusion database\n CosmicCGC create COSMIC cancer gene census database\n CustomGene create custom gene annotation database\n CustomVar create custom variant annotation database\n Dann create DANN database\n Dbsnp create dbSNP database\n Dgv create DGV database\n DiseaseValidity create disease validity database\n DosageMapRegions create dosage map regions\n DosageSensitivity create dosage sensitivity database\n DownloadOmim download OMIM database\n ExtractMiniSA extracts mini SA\n ExtractMiniXml extracts mini XML (ClinVar)\n FilterSpliceNetTsv filter SpliceNet predictions\n FusionCatcher create FusionCatcher database\n Gerp create GERP conservation database\n GlobalMinor create global minor allele database\n Gnomad create gnomAD database\n Gnomad-lcr create gnomAD low complexity region database\n GnomadGeneScores create gnomAD gene scores database\n GnomadSV create gnomAD structural variant database\n Index edit an index file\n MitoHet create mitochondrial Heteroplasmy database\n MitomapSvDb create MITOMAP structural variants database\n MitomapVarDb create MITOMAP small variants database\n Omim create OMIM database\n OneKGen create 1000 Genome small variants database\n OneKGenSv create 1000 Genomes structural variants database\n OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file\n PhyloP create PhyloP database\n PrimateAi create PrimateAI database\n RefMinor create Reference Minor database from 1000 Genome \n RemapWithDbsnp remap a VCF file given source and destination rsID mappings\n Revel create REVEL database\n SpliceAi create SpliceAI database\n TopMed create TOPMed database\n Gme create GME Variome database\n Decipher create Decipher database\n")),(0,o.kt)("p",null,"You can get further detailed help for each sub-command by typing in the subcommand. For example:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-scss"},"dotnet SAUtils.dll clinvar\n---------------------------------------------------------------------------\nSAUtils (c) 2023 Illumina, Inc.\n 3.22.0\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll clinvar [options]\nCreates a supplementary database with ClinVar annotations\n\nOPTIONS:\n --ref, -r compressed reference sequence file\n --rcv, -i ClinVar Full release XML file\n --vcv, -c ClinVar Variation release XML file\n --out, -o output directory\n --help, -h displays the help menu\n --version, -v displays the version\n")),(0,o.kt)("p",null,"More detailed instructions about each sub-command can be found in documentation of respective data sources."),(0,o.kt)("h2",{id:"output-file-formats"},"Output File Formats"),(0,o.kt)("p",null,"The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes)."),(0,o.kt)("table",null,(0,o.kt)("thead",{parentName:"table"},(0,o.kt)("tr",{parentName:"thead"},(0,o.kt)("th",{parentName:"tr",align:null},"File Extension"),(0,o.kt)("th",{parentName:"tr",align:null},"Description"))),(0,o.kt)("tbody",{parentName:"table"},(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".nsa"),(0,o.kt)("td",{parentName:"tr",align:null},"Small variant annotations (e.g. SNV, insertions, deletions, etc.)")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".gsa"),(0,o.kt)("td",{parentName:"tr",align:null},"Compact variant annotations (e.g. SNV, insertions, deletions, etc.)")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".idx"),(0,o.kt)("td",{parentName:"tr",align:null},"Index file")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".nsi"),(0,o.kt)("td",{parentName:"tr",align:null},"Interval annotations (e.g. SV, CNVs, intervals)")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".nga"),(0,o.kt)("td",{parentName:"tr",align:null},"Gene annotations")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".npd"),(0,o.kt)("td",{parentName:"tr",align:null},"Conservation scores")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".rma"),(0,o.kt)("td",{parentName:"tr",align:null},"Reference Minor allele")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".gfs"),(0,o.kt)("td",{parentName:"tr",align:null},"Gene fusions source")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".gfj"),(0,o.kt)("td",{parentName:"tr",align:null},"Gene fusions JSON")),(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:null},".schema"),(0,o.kt)("td",{parentName:"tr",align:null},"JSON schema")))),(0,o.kt)("h2",{id:"sautils-autodownloadgenerate"},"SAUtils AutoDownloadGenerate"),(0,o.kt)("p",null,"To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands.\nThis subcommands basically integrate both download and generate subcommand. Currently, this subcommand support several data sources:"),(0,o.kt)("ul",null,(0,o.kt)("li",{parentName:"ul"},"ClinVar"),(0,o.kt)("li",{parentName:"ul"},"ClinGen"),(0,o.kt)("li",{parentName:"ul"},"dbSNP"),(0,o.kt)("li",{parentName:"ul"},"OMIM"),(0,o.kt)("li",{parentName:"ul"},"COSMIC")),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate\n---------------------------------------------------------------------------\nSAUtils (c) 2024 Illumina, Inc.\n 3.23.0\n---------------------------------------------------------------------------\n\nUSAGE: dotnet SAUtils.dll autodownloadgenerate [options]\nDownloads and generates the Supplementary Database for Omim, ClinGen, ClinVar, dbSNP, and COSMIC\n\nOPTIONS:\n --sources, -s comma separated list of external data sources\n --inputJson, -j input JSON path\n --downloadBaseFolder, -b \n base directory path external datasources\n downloaded to\n --downloadDate, -d \n date directory name that external datasources\n downloaded to. Default is today's date in yyyy-\n MM-dd format (e.g. 2023-01-30).\n --cache, -c \n input cache directory\n --ref, -r input reference filename\n --out, -o output SA directory\n --actions, -a comma separated list of action(s) to perform.\n action options: download, generate.\n --help, -h displays the help menu\n --version, -v displays the version\n")),(0,o.kt)("p",null,"You can download only, generate only, or both download and generate supplementary files.\nTo use this subcommands, you have to prepare a json file that will be used as data sources information.\nBelow is tutorial to use this subcommand to generate each data source."),(0,o.kt)("h3",{id:"autodownloadgenerate-clinvar"},"AutoDownloadGenerate ClinVar"),(0,o.kt)("p",null,"Below is the command to use AutoDownloadGenerate for ClinVar to download and generate supplementary files."),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate -s ClinVar -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]\n")),(0,o.kt)("p",null,"The json file for ClinVar should be formatted like below:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},'{\n"clinvar": {\n "baseDirectory": "ClinVar",\n "sourceFiles": [\n {\n "name": "ClinVar",\n "description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",\n "files": [\n {\n "localFileName": "ClinVarFullRelease_00-latest.xml.gz",\n "downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz",\n "md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz.md5"\n },\n {\n "localFileName": "ClinVarVariationRelease_00-latest.xml.gz",\n "downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz",\n "md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz.md5"\n }\n ]\n }\n ]\n }\n}\n')),(0,o.kt)("p",null,"There is no need to modify the json entry for ClinVar and you can use as it is."),(0,o.kt)("h3",{id:"autodownloadgenerate-clingen"},"AutoDownloadGenerate ClinGen"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate -s ClinGen -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]\n")),(0,o.kt)("p",null,"The json file for ClinGen should be formatted like below:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},'{\n"clingen": {\n "baseDirectory": "ClinGen",\n "sourceFiles": [\n {\n "name": "ClinGen Dosage Sensitivity Map",\n "subDirectory": "DosageSensitivity",\n "description": "Dosage sensitivity map from ClinGen (dbVar)",\n "files": [\n {\n "localFileName": "ClinGen_gene_curation_list_GRCh37.tsv",\n "downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh37.tsv"\n },\n {\n "localFileName": "ClinGen_gene_curation_list_GRCh38.tsv",\n "downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh38.tsv"\n },\n {\n "localFileName": "ClinGen_region_curation_list_GRCh37.tsv",\n "downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh37.tsv"\n },\n {\n "localFileName": "ClinGen_region_curation_list_GRCh38.tsv",\n "downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh38.tsv"\n }\n ]\n },\n {\n "name": "ClinGen disease validity curations",\n "subDirectory": "GeneDiseaseValidity",\n "description": "Disease validity curations from ClinGen (dbVar)",\n "files": [\n {\n "localFileName": "Clingen-Gene-Disease-Summary.csv",\n "downloadUrl": "https://search.clinicalgenome.org/kb/gene-validity/download"\n }\n ]\n }\n ]\n }\n}\n')),(0,o.kt)("p",null,"There is no need to modify the json entry for ClinGen and you can use as it is."),(0,o.kt)("h3",{id:"autodownloadgenerate-dbsnp"},"AutoDownloadGenerate dbSNP"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate -s dbSNP -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]\n")),(0,o.kt)("p",null,"The json file for dbSNP should be formatted like below:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},'{\n "dbsnp": {\n "baseDirectory": "dbSNP",\n "sourceFiles": [\n {\n "name": "dbSNP",\n "description": "Identifiers for observed variants",\n "version": "156",\n "subDirectory": "GRCh37",\n "files": [\n {\n "localFileName": "GCF_000001405.25.gz.tbi",\n "downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi",\n "md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi.md5"\n },\n {\n "localFileName": "GCF_000001405.25.gz",\n "downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz",\n "md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.md5"\n }\n ]\n },\n {\n "name": "dbSNP",\n "description": "Identifiers for observed variants",\n "version": "156",\n "subDirectory": "GRCh38",\n "files": [\n {\n "localFileName": "GCF_000001405.40.gz.tbi",\n "downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi",\n "md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi.md5"\n },\n {\n "localFileName": "GCF_000001405.40.gz",\n "downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz",\n "md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.md5"\n }\n ]\n }\n ]\n }\n}\n')),(0,o.kt)("p",null,"The json above is examplke for dbSNP version 156. If you want to use it for different version, adjust the version number and all entries in files to use the actual URL.\nIf you only want to generate GRCh38, just remove the GRCh37 entries in the sourceFiles."),(0,o.kt)("h3",{id:"autodownloadgenerate-omim"},"AutoDownloadGenerate OMIM"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate -s OMIM -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]\n")),(0,o.kt)("p",null,"The json file for OMIM should be formatted like below:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},'{\n"omim": {\n "baseDirectory": "omim",\n "sourceFiles": [\n {\n "name": "OMIM",\n "description": "An Online Catalog of Human Genes and Genetic Disorders"\n }\n ]\n }\n}\n')),(0,o.kt)("p",null,"There is no need to modify the json entry for OMIM and you can use as it is. You have to export OMIM API key as environment variable with name ",(0,o.kt)("inlineCode",{parentName:"p"},"OmimApiKey"),"."),(0,o.kt)("h3",{id:"autodownloadgenerate-cosmic"},"AutoDownloadGenerate COSMIC"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},"dotnet SAUtils.dll AutoDownloadGenerate -s COSMIC -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]\n")),(0,o.kt)("p",null,"The json file for COSMIC should be formatted like below:"),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre"},'{\n "Cosmic": {\n "baseDirectory": "COSMIC",\n "sourceFiles": [\n {\n "name": "COSMIC",\n "version": "99",\n "description": "the Catalogue Of Somatic Mutations In Cancer"\n }\n ]\n }\n}\n')),(0,o.kt)("p",null,"You have to adjust the version entry according to which COSMIC version you want. You also need to have COSMIC credentials and export it as environment variable with name ",(0,o.kt)("inlineCode",{parentName:"p"},"Cosmic_Username")," and ",(0,o.kt)("inlineCode",{parentName:"p"},"Cosmic_Password")))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/d8b0b6a4.3737c73e.js b/assets/js/d8b0b6a4.3737c73e.js new file mode 100644 index 00000000..7eadaed0 --- /dev/null +++ b/assets/js/d8b0b6a4.3737c73e.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1248,1541,9050],{3905:(e,t,n)=>{n.d(t,{Zo:()=>u,kt:()=>N});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var p=a.createContext({}),s=function(e){var t=a.useContext(p),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},u=function(e){var t=s(e.components);return a.createElement(p.Provider,{value:t},e.children)},m="mdxType",d={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},c=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,l=e.originalType,p=e.parentName,u=i(e,["components","mdxType","originalType","parentName"]),m=s(n),c=r,N=m["".concat(p,".").concat(c)]||m[c]||d[c]||l;return n?a.createElement(N,o(o({ref:t},u),{},{components:n})):a.createElement(N,o({ref:t},u))}));function N(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var l=n.length,o=new Array(l);o[0]=c;var i={};for(var p in t)hasOwnProperty.call(t,p)&&(i[p]=t[p]);i.originalType=e,i[m]="string"==typeof e?e:r,o[1]=i;for(var s=2;s{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>m,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/1000Genomes-snv-json",id:"version-3.24/data-sources/1000Genomes-snv-json",title:"1000Genomes-snv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-snv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-snv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-snv-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],s={toc:p},u="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":{\n "allAf":0.200879,\n "afrAf":0.210287,\n "amrAf":0.139769,\n "easAf":0.275794,\n "eurAf":0.181909,\n "sasAf":0.173824,\n "allAn":5008,\n "afrAn":1322,\n "amrAn":694,\n "easAn":1008,\n "eurAn":1006,\n "sasAn":978,\n "allAc":1006,\n "afrAc":278,\n "amrAc":97,\n "easAc":278,\n "eurAc":183,\n "sasAc":170\n}\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"allAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the African super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"afrAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the African super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the Ad Mixed American super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"amrAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the Ad Mixed American super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the East Asian super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"easAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the East Asian super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the European super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"eurAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the European super population. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:"center"},"float"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAc"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele count for the South Asian super population. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"sasAn"),(0,r.kt)("td",{parentName:"tr",align:"center"},"int"),(0,r.kt)("td",{parentName:"tr",align:"left"},"allele number for the South Asian super population. Non-zero integer.")))))}m.isMDXComponent=!0},8866:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>m,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/1000Genomes-sv-json",id:"version-3.24/data-sources/1000Genomes-sv-json",title:"1000Genomes-sv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-sv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-sv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],s={toc:p},u="wrapper";function m(e){let{components:t,...n}=e;return(0,r.kt)(u,(0,a.Z)({},s,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":[\n {\n "chromosome":"1",\n "begin":1595369,\n "end":1612441,\n "variantType": "copy_number_variation",\n "id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",\n "allAn": 5008,\n "allAc": 2702,\n "allAf": 0.539537,\n "afrAf": 0.6052,\n "amrAf": 0.3675,\n "eurAf": 0.5357,\n "easAf": 0.5368,\n "sasAf": 0.5797,\n "reciprocalOverlap": 0.07555\n }\n],\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"id"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"range: 0 - 1.")))))}m.isMDXComponent=!0},7665:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>p,default:()=>c,frontMatter:()=>i,metadata:()=>s,toc:()=>u});var a=n(7462),r=(n(7294),n(3905)),l=n(6380),o=n(8866);const i={title:"1000 Genomes"},p=void 0,s={unversionedId:"data-sources/1000Genomes",id:"version-3.24/data-sources/1000Genomes",title:"1000 Genomes",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes.mdx",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes.mdx",tags:[],version:"3.24",frontMatter:{title:"1000 Genomes"},sidebar:"docs",previous:{title:"Parsing Illumina Connected Annotations JSON",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/parsing-json"},next:{title:"Amino Acid Conservation",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation"}},u=[{value:"Overview",id:"overview",children:[],level:2},{value:"Populations",id:"populations",children:[],level:2},{value:"Small Variants",id:"small-variants",children:[{value:"VCF File Parsing",id:"vcf-file-parsing",children:[{value:"Conflict Resolution",id:"conflict-resolution",children:[],level:4}],level:3}],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2},{value:"Structural Variants",id:"structural-variants",children:[{value:"VCF File Parsing",id:"vcf-file-parsing-1",children:[],level:3},{value:"Converting VCF svTypes to SO sequence alterations",id:"converting-vcf-svtypes-to-so-sequence-alterations",children:[{value:"Exceptions",id:"exceptions",children:[],level:4}],level:3}],level:2},{value:"JSON Output",id:"json-output-1",children:[],level:2}],m={toc:u},d="wrapper";function c(e){let{components:t,...n}=e;return(0,r.kt)(d,(0,a.Z)({},m,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,"The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. ",(0,r.kt)("em",{parentName:"p"},"Nature 526"),", 75\u201381 (2015). ",(0,r.kt)("a",{parentName:"p",href:"https://doi.org/10.1038/nature15394"},"https://doi.org/10.1038/nature15394")))),(0,r.kt)("h2",{id:"populations"},"Populations"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"The super population membership can be found here: (",(0,r.kt)("a",{parentName:"li",href:"http://www.1000genomes.org/category/population/"},"http://www.1000genomes.org/category/population/"),")"),(0,r.kt)("li",{parentName:"ul"},"We want to capture the allele frequencies for all 26 populations as well as the 5 super populations and the total population.")),(0,r.kt)("h2",{id:"small-variants"},"Small Variants"),(0,r.kt)("h3",{id:"vcf-file-parsing"},"VCF File Parsing"),(0,r.kt)("p",null,"The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following."),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO\n1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633\n")),(0,r.kt)("p",null,"The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored)."),(0,r.kt)("p",null,"We parse the VCF file and extract the following fields from INFO:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"AA"),(0,r.kt)("li",{parentName:"ul"},"AC"),(0,r.kt)("li",{parentName:"ul"},"AN"),(0,r.kt)("li",{parentName:"ul"},"EAS_AN"),(0,r.kt)("li",{parentName:"ul"},"AMR_AN"),(0,r.kt)("li",{parentName:"ul"},"AFR_AN"),(0,r.kt)("li",{parentName:"ul"},"EUR_AN"),(0,r.kt)("li",{parentName:"ul"},"SAS_AN"),(0,r.kt)("li",{parentName:"ul"},"EAS_AC"),(0,r.kt)("li",{parentName:"ul"},"AMR_AC"),(0,r.kt)("li",{parentName:"ul"},"AFR_AC"),(0,r.kt)("li",{parentName:"ul"},"EUR_AC"),(0,r.kt)("li",{parentName:"ul"},"SAS_AC")),(0,r.kt)("h4",{id:"conflict-resolution"},"Conflict Resolution"),(0,r.kt)("p",null,"We have observed conflicting allele frequency information in the source. Take the following example:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO\n1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;\n1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;\n")),(0,r.kt)("p",null,"That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"center"},"Chromosome"),(0,r.kt)("th",{parentName:"tr",align:"left"},"#"," of alleles"),(0,r.kt)("th",{parentName:"tr",align:"center"},"#"," of conflicting alleles"),(0,r.kt)("th",{parentName:"tr",align:"left"},"percentage"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"center"},"chrX"),(0,r.kt)("td",{parentName:"tr",align:"left"},"834800"),(0,r.kt)("td",{parentName:"tr",align:"center"},"2733"),(0,r.kt)("td",{parentName:"tr",align:"left"},"0.33%")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"center"},"Total"),(0,r.kt)("td",{parentName:"tr",align:"left"},"21413098"),(0,r.kt)("td",{parentName:"tr",align:"center"},"2743"),(0,r.kt)("td",{parentName:"tr",align:"left"},"0.013%")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Currently"),", we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line."),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Potential Alternate Solutions")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)"),(0,r.kt)("li",{parentName:"ul"},"Recalculate the allele frequency for the conflicting allele."),(0,r.kt)("li",{parentName:"ul"},"Pick the allele frequency that has the highest data support.")),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/"},"GRCh37"),"\n",(0,r.kt)("a",{parentName:"p",href:"http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/"},"GRCh38")),(0,r.kt)("h2",{id:"json-output"},"JSON Output"),(0,r.kt)(l.default,{mdxType:"JSONSNV"}),(0,r.kt)("h2",{id:"structural-variants"},"Structural Variants"),(0,r.kt)("h3",{id:"vcf-file-parsing-1"},"VCF File Parsing"),(0,r.kt)("p",null,"The VCF files contain entries like the following:"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103\n22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A ,,, 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4\n")),(0,r.kt)("p",null,"Please note that, CNVs are allele-specific. For example, HG00096 is effectively copy number 4, which would be a net gain on chr22."),(0,r.kt)("p",null,"1000 Genomes contains 5 types of structural variants:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"CNV"),(0,r.kt)("li",{parentName:"ul"},"DEL"),(0,r.kt)("li",{parentName:"ul"},"DUP"),(0,r.kt)("li",{parentName:"ul"},"INS"),(0,r.kt)("li",{parentName:"ul"},"INV")),(0,r.kt)("p",null,"Since data of 1000 genomes is provided in VCF format, we assume that the coordinates follow the vcf format, i.e., there is a padding base for symbolic alleles. So all the interval can be interpreted as ","[BEGIN+1, END]",".\nSimilarly, for all other variant types except insertion, END is far larger than BEGIN. The distribution of BEGIN and END for insertions is summarized below."),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"Insertion issues")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"END = BEGIN for 6/165"),(0,r.kt)("li",{parentName:"ul"},"END = BEGIN+2 for 93/165"),(0,r.kt)("li",{parentName:"ul"},"END = BEGIN+3 for 11/165"),(0,r.kt)("li",{parentName:"ul"},"END = BEGIN+4 for 11/165"),(0,r.kt)("li",{parentName:"ul"},"END \u2013 BEGIN range from 5 to 1156 for others.")),(0,r.kt)("h3",{id:"converting-vcf-svtypes-to-so-sequence-alterations"},"Converting VCF svTypes to SO sequence alterations"),(0,r.kt)("p",null,"The svType will be captured in our JSON file under the ",(0,r.kt)("a",{parentName:"p",href:"http://www.sequenceontology.org/browser/current_svn/term/SO:0001059"},"sequenceAlteration")," key. Here's the translation we'll use according to svType in 1000 Genomes."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"svType"),(0,r.kt)("th",{parentName:"tr",align:null},"Alternative Alleles contain "),(0,r.kt)("th",{parentName:"tr",align:null},"sequenceAlteration"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ALU"),(0,r.kt)("td",{parentName:"tr",align:null},"FALSE"),(0,r.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DUP"),(0,r.kt)("td",{parentName:"tr",align:null},"TRUE"),(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_gain")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"CNV"),(0,r.kt)("td",{parentName:"tr",align:null},"TRUE"),(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_gain (observed_gains >0 and observed_losses =0) ",(0,r.kt)("br",null),"copy_number_loss\xa0(observed_gains = 0 and observed_losses > 0) ",(0,r.kt)("br",null),"copy_number_variation (otherwise)")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DEL"),(0,r.kt)("td",{parentName:"tr",align:null},"TRUE"),(0,r.kt)("td",{parentName:"tr",align:null},"copy_number_loss")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"LINE1"),(0,r.kt)("td",{parentName:"tr",align:null},"FALSE"),(0,r.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"SVA"),(0,r.kt)("td",{parentName:"tr",align:null},"FALSE"),(0,r.kt)("td",{parentName:"tr",align:null},"mobile_element_insertion")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"INV"),(0,r.kt)("td",{parentName:"tr",align:null},"FALSE"),(0,r.kt)("td",{parentName:"tr",align:null},"inversion")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"INS"),(0,r.kt)("td",{parentName:"tr",align:null},"FALSE"),(0,r.kt)("td",{parentName:"tr",align:null},"insertion")))),(0,r.kt)("h4",{id:"exceptions"},"Exceptions"),(0,r.kt)("p",null,(0,r.kt)("em",{parentName:"p"},"We discard structural variants without END")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103\n21 9495848 esv3646347 A 100 PASS AC=1543;AF=0.308107;AN=5008;CS=L1_umary;MEINFO=LINE1,5669,6005,+;NS=2504;SVLEN=336;SVTYPE=LINE1;TSD=null;DP=20015;EAS_AF=0.3125;AMR_AF=0.2911;AFR_AF=0.3026;EUR_AF=0.2922;SAS_AF=0.3395;VT=SV GT 0|0 1|1 1|0 0|1 1|0 1|0 0|0\n")),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"CNVs in chrY")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"No other types of structural variants exist in chrY"),(0,r.kt)("li",{parentName:"ul"},'Since copy number is provided in genotype field, we directly parse the copy number from "CN" field.'),(0,r.kt)("li",{parentName:"ul"},"For most CNVs in chrY, the reference copy number is 1, but the refence number for CNVs in segmental duplication sites is 2 ("," in the 2nd example). All segmental duplication calls have identifiers starting with GS_SD_M2.")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00101 HG00103 HG00105 HG00107 HG00108\nY 2888555 CNV_Y_2888555_3014661 T 100 PASS AC=1;AF=0.000817661;AN=1223;END=3014661;NS=1233;SVTYPE=CNV;AMR_AF=0.0000;AFR_AF=0.0000;EUR_AF=0.0000;SAS_AF=0.0019;EAS_AF=0.0000;VT=SV GT:CN:CNL:CNP:CNQ:GP:GQ:PL 0:1:-1000,0,-58.45:-1000,0,-61.55:99:0,-61.55:99:0,585 0:1:-296.36,0,-16.6:-300.46,0,-19.7:99:0,-19.7:99:0,166 0:1:-1000,0,-39.44:-1000,0,-42.54:99:0,-42.54:99:0,394\nY 6128381 GS_SD_M2_Y_6128381_6230094_Y_9650284_9752225 C , 100 PASS AC=4,2;AF=0.00327065,0.00163532;AN=1223;END=6230094;NS=1233;SVTYPE=CNV;AMR_AF=0.0029,0.0029;AFR_AF=0.0016,0.0016;EUR_AF=0.0000,0.0000;SAS_AF=0.0038,0.0000;EAS_AF=0.0000,0.0000;VT=SV;EX_TARGET GT:CN:CNL:CNP:CNQ:GP:GQ 0:2:-1000,-138.78,0,-38.53:-1000,-141.27,0,-41.33:99:0,-141.27,-41.33:99 0:2:-1000,-53.32,0,-17.85:-1000,-55.81,0,-20.64:99:0,-55.81,-20.64:99 0:2:-1000,-71.83,0,-32.5:-1000,-74.32,0,-35.29:99:0,-74.32,-35.29:99 0:2:-1000,-60.96,0,-20.29:-1000,-63.45,0,-23.08:99:0,-63.45,-23.08:99 0:2:-1000,-77.6,0,-31.45:-1000,-80.09,0,-34.24:99:0,-80.09,-34.24:99\n")),(0,r.kt)("h2",{id:"json-output-1"},"JSON Output"),(0,r.kt)(o.default,{mdxType:"JSONSV"}))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/d9f334cf.ccb601aa.js b/assets/js/d9f334cf.ccb601aa.js new file mode 100644 index 00000000..8d075659 --- /dev/null +++ b/assets/js/d9f334cf.ccb601aa.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[417],{3905:(e,t,n)=>{n.d(t,{Zo:()=>s,kt:()=>f});var r=n(7294);function o(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function a(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function l(e){for(var t=1;t=0||(o[n]=e[n]);return o}(e,t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(o[n]=e[n])}return o}var i=r.createContext({}),p=function(e){var t=r.useContext(i),n=t;return e&&(n="function"==typeof e?e(t):l(l({},t),e)),n},s=function(e){var t=p(e.components);return r.createElement(i.Provider,{value:t},e.children)},u="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},d=r.forwardRef((function(e,t){var n=e.components,o=e.mdxType,a=e.originalType,i=e.parentName,s=c(e,["components","mdxType","originalType","parentName"]),u=p(n),d=o,f=u["".concat(i,".").concat(d)]||u[d]||m[d]||a;return n?r.createElement(f,l(l({ref:t},s),{},{components:n})):r.createElement(f,l({ref:t},s))}));function f(e,t){var n=arguments,o=t&&t.mdxType;if("string"==typeof e||o){var a=n.length,l=new Array(a);l[0]=d;var c={};for(var i in t)hasOwnProperty.call(t,i)&&(c[i]=t[i]);c.originalType=e,c[u]="string"==typeof e?e:o,l[1]=c;for(var p=2;p{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>u,frontMatter:()=>a,metadata:()=>c,toc:()=>i});var r=n(7462),o=(n(7294),n(3905));const a={},l=void 0,c={unversionedId:"data-sources/phylop-json",id:"version-3.24/data-sources/phylop-json",title:"phylop-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/phylop-json.md",sourceDirName:"data-sources",slug:"/data-sources/phylop-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/phylop-json.md",tags:[],version:"3.24",frontMatter:{}},i=[],p={toc:i},s="wrapper";function u(e){let{components:t,...n}=e;return(0,o.kt)(s,(0,r.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,o.kt)("pre",null,(0,o.kt)("code",{parentName:"pre",className:"language-json",metastring:"{10}","{10}":!0},'"variants":[\n {\n "vid":"2:48010488:A",\n "chromosome":"chr2",\n "begin":48010488,\n "end":48010488,\n "refAllele":"G",\n "altAllele":"A",\n "variantType":"SNV",\n "phylopScore":0.459\n }\n] \n')),(0,o.kt)("table",null,(0,o.kt)("thead",{parentName:"table"},(0,o.kt)("tr",{parentName:"thead"},(0,o.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,o.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,o.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,o.kt)("tbody",{parentName:"table"},(0,o.kt)("tr",{parentName:"tbody"},(0,o.kt)("td",{parentName:"tr",align:"left"},"phylopScore"),(0,o.kt)("td",{parentName:"tr",align:"center"},"float"),(0,o.kt)("td",{parentName:"tr",align:"left"},"range: -14.08 to 6.424")))))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/da6337b1.6846131b.js b/assets/js/da6337b1.6846131b.js new file mode 100644 index 00000000..192cf74d --- /dev/null +++ b/assets/js/da6337b1.6846131b.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5590],{3905:(e,t,n)=>{n.d(t,{Zo:()=>m,kt:()=>u});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},m=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},p="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},h=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,m=l(e,["components","mdxType","originalType","parentName"]),p=d(n),h=i,u=p["".concat(s,".").concat(h)]||p[h]||c[h]||r;return n?a.createElement(u,o(o({ref:t},m),{},{components:n})):a.createElement(u,o({ref:t},m))}));function u(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=h;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[p]="string"==typeof e?e:i,o[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>p,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={title:"Mitochondrial Heteroplasmy"},o=void 0,l={unversionedId:"data-sources/mito-heteroplasmy",id:"version-3.24/data-sources/mito-heteroplasmy",title:"Mitochondrial Heteroplasmy",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/mito-heteroplasmy.md",sourceDirName:"data-sources",slug:"/data-sources/mito-heteroplasmy",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mito-heteroplasmy",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mito-heteroplasmy.md",tags:[],version:"3.24",frontMatter:{title:"Mitochondrial Heteroplasmy"},sidebar:"docs",previous:{title:"gnomAD",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad"},next:{title:"MITOMAP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"JSON File",id:"json-file",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[{value:"Binning VRF Data",id:"binning-vrf-data",children:[],level:4},{value:"Pre-processing the Data",id:"pre-processing-the-data",children:[],level:4},{value:"Algorithm",id:"algorithm",children:[],level:4}],level:3}],level:2},{value:"Download URL",id:"download-url",children:[],level:2},{value:"JSON Output",id:"json-output",children:[],level:2}],d={toc:s},m="wrapper";function p(e){let{components:t,...n}=e;return(0,i.kt)(m,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline."),(0,i.kt)("h2",{id:"json-file"},"JSON File"),(0,i.kt)("h3",{id:"example"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'{\n "T:C":{\n "ad":[\n 1,\n 1,\n 1,\n 1,\n 1,\n 1\n ],\n "allele_type":"alt",\n "vrf":[\n 0.002369668246445498,\n 0.0024937655860349127,\n 0.0016129032258064516,\n 0.0025188916876574307,\n 0.0022935779816513763,\n 0.002008032128514056\n ],\n "vrf_stats":{\n "kurtosis":38.889891511122556,\n "max":0.0025188916876574307,\n "mean":5.4052190471990743e-05,\n "min":0.0,\n "nobs":246,\n "skewness":6.346664692283075,\n "stdev":0.0003461416264750575,\n "variance":1.1981402557879823e-07\n }\n }\n}\n\n')),(0,i.kt)("h3",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"From the JSON file, we're mainly interested in the following keys:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"variant")," (i.e. ",(0,i.kt)("inlineCode",{parentName:"li"},"T:C"),")"),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"ad")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"vrf")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("inlineCode",{parentName:"li"},"nobs")," (number of observations)")),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Adjusting for null observations")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"The ",(0,i.kt)("inlineCode",{parentName:"p"},"nobs")," value indicates how many observations were made. Ideally this would have been represented in the ",(0,i.kt)("inlineCode",{parentName:"p"},"ad")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"vrf")," arrays, but it's left as an exercise for the reader."))),(0,i.kt)("h4",{id:"binning-vrf-data"},"Binning VRF Data"),(0,i.kt)("p",null,"The ",(0,i.kt)("inlineCode",{parentName:"p"},"vrf")," (variant read frequency) array in the JSON object above is paired with with the ",(0,i.kt)("inlineCode",{parentName:"p"},"ad")," array (allele depths) shown above."),(0,i.kt)("p",null,"The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments."),(0,i.kt)("p",null,"With the binned data, we end up having 775 distinct ",(0,i.kt)("inlineCode",{parentName:"p"},"vrf")," values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143."),(0,i.kt)("h4",{id:"pre-processing-the-data"},"Pre-processing the Data"),(0,i.kt)("p",null,"The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"#CHROM POS REF ALT VRF_BINS VRF_COUNTS\nchrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736\nchrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736\n")),(0,i.kt)("h4",{id:"algorithm"},"Algorithm"),(0,i.kt)("p",null,"Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Percentiles")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Illumina Connected Annotations uses the ",(0,i.kt)("a",{parentName:"p",href:"https://en.wikipedia.org/wiki/Percentile"},"statistical definition of percentile")," (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1)."))),(0,i.kt)("h2",{id:"download-url"},"Download URL"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Unavailable")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"The original data set is only available internally at Illumina at the moment."))),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{14-17}","{14-17}":!0},'"samples":[\n {\n "genotype":"0/1",\n "variantFrequencies":[\n 0.333,\n 0.5\n ],\n ],\n "alleleDepths":[\n 10,\n 20,\n 30\n ],\n "heteroplasmyPercentile":[\n 23.13,\n 12.65\n ]\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"heteroplasmyPercentile"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"one percentile for each variant frequency (each alternate allele)")))))}p.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/e8574da6.a78525f9.js b/assets/js/e8574da6.a78525f9.js new file mode 100644 index 00000000..ef52c4f7 --- /dev/null +++ b/assets/js/e8574da6.a78525f9.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[8053],{3905:(t,n,e)=>{e.d(n,{Zo:()=>u,kt:()=>f});var a=e(7294);function l(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function r(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(l[e]=t[e]);return l}(t,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(l[e]=t[e])}return l}var m=a.createContext({}),p=function(t){var n=a.useContext(m),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},u=function(t){var n=p(t.components);return a.createElement(m.Provider,{value:n},t.children)},d="mdxType",g={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},N=a.forwardRef((function(t,n){var e=t.components,l=t.mdxType,r=t.originalType,m=t.parentName,u=o(t,["components","mdxType","originalType","parentName"]),d=p(e),N=l,f=d["".concat(m,".").concat(N)]||d[N]||g[N]||r;return e?a.createElement(f,i(i({ref:n},u),{},{components:e})):a.createElement(f,i({ref:n},u))}));function f(t,n){var e=arguments,l=n&&n.mdxType;if("string"==typeof t||l){var r=e.length,i=new Array(r);i[0]=N;var o={};for(var m in n)hasOwnProperty.call(n,m)&&(o[m]=n[m]);o.originalType=t,o[d]="string"==typeof t?t:l,i[1]=o;for(var p=2;p{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>d,frontMatter:()=>r,metadata:()=>o,toc:()=>m});var a=e(7462),l=(e(7294),e(3905));const r={},i=void 0,o={unversionedId:"data-sources/gnomad4.0-small-variants-json",id:"version-3.24/data-sources/gnomad4.0-small-variants-json",title:"gnomad4.0-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/gnomad4.0-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/gnomad4.0-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/gnomad4.0-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},m=[],p={toc:m},u="wrapper";function d(t){let{components:n,...e}=t;return(0,l.kt)(u,(0,a.Z)({},p,e,{components:n,mdxType:"MDXLayout"}),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad": {\n "coverage": 154,\n "failedFilter": true,\n "allAf": 0.5,\n "allAn": 152428,\n "allAc": 76214,\n "allHc": 0,\n "afrAf": 0.5,\n "afrAn": 41608,\n "afrAc": 20804,\n "afrHc": 0,\n "amiAf": 0.5,\n "amiAn": 912,\n "amiAc": 456,\n "amiHc": 0,\n "amrAf": 0.5,\n "amrAn": 15314,\n "amrAc": 7657,\n "amrHc": 0,\n "easAf": 0.5,\n "easAn": 5196,\n "easAc": 2598,\n "easHc": 0,\n "finAf": 0.5,\n "finAn": 10632,\n "finAc": 5316,\n "finHc": 0,\n "nfeAf": 0.5,\n "nfeAn": 68050,\n "nfeAc": 34025,\n "nfeHc": 0,\n "asjAf": 0.5,\n "asjAn": 3472,\n "asjAc": 1736,\n "asjHc": 0,\n "sasAf": 0.5,\n "sasAn": 4834,\n "sasAc": 2417,\n "sasHc": 0,\n "midAf": 0.5,\n "midAn": 294,\n "midAc": 147,\n "midHc": 0,\n "remainingAf": 0.5,\n "remainingAn": 2116,\n "remainingAc": 1058,\n "remainingHc": 0,\n "maleAf": 0.5,\n "maleAn": 74544,\n "maleAc": 37272,\n "maleHc": 0,\n "femaleAf": 0.5,\n "femaleAn": 77884,\n "femaleAc": 38942,\n "femaleHc": 0\n}\n')),(0,l.kt)("pre",null,(0,l.kt)("code",{parentName:"pre",className:"language-json"},'"gnomad-exome": {\n "coverage": 53,\n "allAf": 0.495074,\n "allAn": 4060,\n "allAc": 2010,\n "allHc": 11,\n "afrAf": 0.5,\n "afrAn": 86,\n "afrAc": 43,\n "afrHc": 0,\n "amrAf": 0.5,\n "amrAn": 46,\n "amrAc": 23,\n "amrHc": 0,\n "easAf": 0.491071,\n "easAn": 112,\n "easAc": 55,\n "easHc": 0,\n "finAf": 0.5,\n "finAn": 306,\n "finAc": 153,\n "finHc": 0,\n "nfeAf": 0.49503,\n "nfeAn": 3018,\n "nfeAc": 1494,\n "nfeHc": 11,\n "asjAf": 0.461538,\n "asjAn": 26,\n "asjAc": 12,\n "asjHc": 0,\n "sasAf": 0.486111,\n "sasAn": 72,\n "sasAc": 35,\n "sasHc": 0,\n "midAf": 0.5,\n "midAn": 68,\n "midAc": 34,\n "midHc": 0,\n "remainingAf": 0.493865,\n "remainingAn": 326,\n "remainingAc": 161,\n "remainingHc": 0,\n "maleAf": 0.495212,\n "maleAn": 2924,\n "maleAc": 1448,\n "maleHc": 9,\n "femaleAf": 0.494718,\n "femaleAn": 1136,\n "femaleAc": 562,\n "femaleHc": 2\n}\n')),(0,l.kt)("table",null,(0,l.kt)("thead",{parentName:"table"},(0,l.kt)("tr",{parentName:"thead"},(0,l.kt)("th",{parentName:"tr",align:null},"Field"),(0,l.kt)("th",{parentName:"tr",align:null},"Type"),(0,l.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,l.kt)("tbody",{parentName:"table"},(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"coverage"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"average coverage (non-negative integer values)")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for male population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for male population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for male population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"maleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for male population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for female population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for female population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for female population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"femaleHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for female population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Other population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Other population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Other population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"remainingHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Other population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"allHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for all populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the African / African American population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the African / African American population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the African / African American population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"afrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for African / African American population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for Amish populations. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for Amish populations. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for Amish populations. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amiHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Amish populations. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Latino population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Latino population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Latino population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"amrHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Latino population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the East Asian population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the East Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"easHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for East Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Finnish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Finnish population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Finnish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"finHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Finnish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Non-Finnish European population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Non-Finnish European population. Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Non-Finnish European population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"nfeHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for Non-Finnish European population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the Ashkenazi Jewish population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the Ashkenazi Jewish population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"asjHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the South Asian population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the South Asian population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"sasHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the South Asian population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAf"),(0,l.kt)("td",{parentName:"tr",align:null},"float"),(0,l.kt)("td",{parentName:"tr",align:null},"allele frequency for the Middle Eastern population. Range: 0 - 1.0")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele count for the iddle Eastern population Integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midAn"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"allele number for the iddle Eastern population. Non-zero integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"midHc"),(0,l.kt)("td",{parentName:"tr",align:null},"int"),(0,l.kt)("td",{parentName:"tr",align:null},"count of homozygous individuals for the iddle Eastern population. Non-negative integer.")),(0,l.kt)("tr",{parentName:"tbody"},(0,l.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,l.kt)("td",{parentName:"tr",align:null},"bool"),(0,l.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters (Note: we do not list the failed filters)")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/e95cadfe.aa190e36.js b/assets/js/e95cadfe.aa190e36.js deleted file mode 100644 index cf5d30a4..00000000 --- a/assets/js/e95cadfe.aa190e36.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5277],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>h});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),d=c(n),u=i,h=d["".concat(s,".").concat(u)]||d[u]||m[u]||r;return n?a.createElement(h,o(o({ref:t},p),{},{components:n})):a.createElement(h,o({ref:t},p))}));function h(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:i,o[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={title:"Gene Fusion Detection"},o=void 0,l={unversionedId:"core-functionality/gene-fusions",id:"core-functionality/gene-fusions",title:"Gene Fusion Detection",description:"Overview",source:"@site/docs/core-functionality/gene-fusions.md",sourceDirName:"core-functionality",slug:"/core-functionality/gene-fusions",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/gene-fusions.md",tags:[],version:"current",frontMatter:{title:"Gene Fusion Detection"},sidebar:"docs",previous:{title:"Transcript Consequence Impact",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts"},next:{title:"Variant IDs",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Approach",id:"approach",children:[{value:"Variant Types",id:"variant-types",children:[],level:3},{value:"Criteria",id:"criteria",children:[],level:3}],level:2},{value:"ETV6/RUNX1 Example",id:"etv6runx1-example",children:[{value:"VCF",id:"vcf",children:[],level:3},{value:"JSON Output",id:"json-output",children:[{value:"Gene Fusion Data Sources",id:"gene-fusion-data-sources",children:[],level:4},{value:"Consequences",id:"consequences",children:[],level:4},{value:"Gene Fusions Section",id:"gene-fusions-section",children:[],level:4}],level:3}],level:2}],c={toc:s},p="wrapper";function d(e){let{components:t,...r}=e;return(0,i.kt)(p,(0,a.Z)({},c,r,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed."),(0,i.kt)("p",null,"Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations."),(0,i.kt)("p",null,"The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(6851).Z})),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. ",(0,i.kt)("a",{parentName:"p",href:"https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-015-0252-1"},"Landscape of gene fusions in epithelial cancers: seq and ye shall find"),". Genome Med 7, 129 (2015)"))),(0,i.kt)("h2",{id:"approach"},"Approach"),(0,i.kt)("p",null,"Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_014206.3")," (",(0,i.kt)("strong",{parentName:"p"},"TMEM258"),") and ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_013402.4")," (",(0,i.kt)("strong",{parentName:"p"},"FADS1"),"). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 transcripts",src:n(7309).Z})),(0,i.kt)("p",null,"The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 gene fusions",src:n(2434).Z})),(0,i.kt)("p",null,"Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion.\nIf only unidirectional gene fusions are desired, only these two fusions can be detected. If ",(0,i.kt)("inlineCode",{parentName:"p"},"enable-bidirectional-fusions")," is enabled, all four cases can be identified."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Interpreting translocation breakends")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the ",(0,i.kt)("a",{parentName:"p",href:"https://samtools.github.io/hts-specs/VCFv4.2.pdf"},"VCF 4.2 specification"),"."),(0,i.kt)("table",{parentName:"div"},(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"REF"),(0,i.kt)("th",{parentName:"tr",align:"left"},"ALT"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Meaning"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t[p["),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the right of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t]p]"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending left of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"]p]t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the left of p is joined before t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"[p[t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending right of p is joined before t")))))),(0,i.kt)("h3",{id:"variant-types"},"Variant Types"),(0,i.kt)("p",null,"Specifically we can identify gene fusions from the following structural variant types:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"deletions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"tandem_duplications (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"inversions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"translocation breakpoints (",(0,i.kt)("inlineCode",{parentName:"li"},"AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911["),") ")),(0,i.kt)("h3",{id:"criteria"},"Criteria"),(0,i.kt)("p",null,"The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},"After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is not enabled. They can have the same or different orientations if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is set."),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must belong to different genes"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)")),(0,i.kt)("h2",{id:"etv6runx1-example"},"ETV6/RUNX1 Example"),(0,i.kt)("p",null,"ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Sun C., Chang L., Zhu X. ",(0,i.kt)("a",{parentName:"p",href:"https://www.oncotarget.com/article/16367/text/"},"Pathogenesis of ETV6/RUNX1-positive childhood acute lymphoblastic leukemia and mechanisms underlying its relapse"),". Oncotarget. 2017; 8: 35445-35459"))),(0,i.kt)("h3",{id:"vcf"},"VCF"),(0,i.kt)("p",null,"Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\nchr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND\nchr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND\nchr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND\nchr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND\n")),(0,i.kt)("p",null,"When you put these calls together, the resulting genomic rearrangement looks something like this:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(3299).Z})),(0,i.kt)("h3",{id:"json-output"},"JSON Output"),(0,i.kt)("p",null,"The annotation for the first variant in the VCF looks like this:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{36-58}","{36-58}":!0},'{"positions":[\n{\n "chromosome": "12",\n "position": 12026270,\n "refAllele": "C",\n "altAlleles": [\n "[chr21:36420865[C"\n ],\n "filters": [\n "PASS"\n ],\n "cytogeneticBand": "12p13.2",\n "variants": [\n {\n "vid": "12-12026270-C-[chr21:36420865[C",\n "chromosome": "12",\n "begin": 12026270,\n "end": 12026270,\n "isStructuralVariant": true,\n "refAllele": "C",\n "altAllele": "[chr21:36420865[C",\n "variantType": "translocation",\n "transcripts": [\n {\n "transcript": "ENST00000396373.4",\n "source": "Ensembl",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "ENSG00000139083",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "ENST00000437180.1",\n "bioType": "mRNA",\n "source": "Ensembl",\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000409227.1",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n },\n {\n "transcript": "ENST00000300305.3",\n "bioType": "mRNA",\n "source": "Ensembl",\n "isCanonical": true,\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000300305.3",\n "intron": 1,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "ENSP00000379658.3"\n },\n {\n "transcript": "NM_001987.5",\n "source": "RefSeq",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "2120",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "NM_001754.5",\n "bioType": "mRNA",\n "source": "RefSeq",\n "isCanonical": true,\n "geneId": "861",\n "proteinId": "NP_001745.2",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "NP_001978.1"\n }\n ]\n }\n ]\n}\n]}\n\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"bioType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"descriptions of the ",(0,i.kt)("a",{parentName:"td",href:"https://uswest.ensembl.org/info/genome/genebuild/biotypes.html"},"biotypes from Ensembl"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"exon"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"exon that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"intron"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"intron that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"geneId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene ID. e.g. ENSG00000116062")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgvsr"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"HGVS RNA nomenclature")))),(0,i.kt)("h4",{id:"gene-fusion-data-sources"},"Gene Fusion Data Sources"),(0,i.kt)("p",null,"To provide more context to our gene fusions, we provide the following gene fusion data sources:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/cosmic"},"COSMIC")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/fusioncatcher"},"FusionCatcher"))),(0,i.kt)("h4",{id:"consequences"},"Consequences"),(0,i.kt)("p",null,"When a gene fusion is identified, we add the following Sequence Ontology consequence:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{3}","{3}":!0},' "consequence": [\n "transcript_variant",\n "gene_fusion"\n ],\n')),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"If both transcripts have the same orientation, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"unidirectional_gene_fusion"),", if they have different orientations, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"bidirectional_gene_fusion")),(0,i.kt)("li",{parentName:"ul"},"If both unidirectional and bidirectional ones are detected, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"gene_fusion"),".")),(0,i.kt)("h4",{id:"gene-fusions-section"},"Gene Fusions Section"),(0,i.kt)("p",null,"The ",(0,i.kt)("inlineCode",{parentName:"p"},"geneFusions")," section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ",(0,i.kt)("inlineCode",{parentName:"p"},"ENST00000396373.4"),", there 7 other Ensembl transcripts that would produce a gene fusion. For ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4"),", there was only one transcript (",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4"),") that produce a gene fusion."),(0,i.kt)("p",null,"For each originating transcript, we report the following for each partner transcript:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"transcript ID"),(0,i.kt)("li",{parentName:"ul"},"gene ID"),(0,i.kt)("li",{parentName:"ul"},"HGNC gene symbol"),(0,i.kt)("li",{parentName:"ul"},"transcript bio type (e.g. protein_coding)"),(0,i.kt)("li",{parentName:"ul"},"intron or exon number containing the breakpoint"),(0,i.kt)("li",{parentName:"ul"},"HGVS RNA notation"),(0,i.kt)("li",{parentName:"ul"},"gene fusion directionality")),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see ",(0,i.kt)("a",{parentName:"p",href:"https://varnomen.hgvs.org/bg-material/consultation/svd-wg007"},"HGVS SVD-WG007"),")."))),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{8}","{8}":!0},' "geneFusions": [\n {\n "transcript": "NM_001754.4",\n "bioType": "protein_coding",\n "intron": 2,\n "geneId": "861",\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",\n "directionality":"uniDirectional"\n }\n ],\n')),(0,i.kt)("p",null,"The HGVS RNA notation above indicates that the gene fusion starts with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4")," (RUNX1) until CDS position 58 and continues with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4")," (ETV6). ",(0,i.kt)("inlineCode",{parentName:"p"},"1009+3367")," indicates that the fusion occurred 3367 bp within intron 2."))}d.isMDXComponent=!0},2434:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_GeneFusions-e5e3758ea9d2c07d3591e3801b2bf7e3.svg"},7309:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_Transcripts-fe1b9c6be1f7cbfefbce887f8cec5d58.svg"},3299:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/etv6-runx1-fusion-ec8f4312c9aca496bde0d6e2b1bbd50d.svg"},6851:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/gene-fusions-fig2-1cce8ac31b00465c8d36bdc47ec3309e.svg"}}]); \ No newline at end of file diff --git a/assets/js/e95cadfe.d1638d74.js b/assets/js/e95cadfe.d1638d74.js new file mode 100644 index 00000000..99043356 --- /dev/null +++ b/assets/js/e95cadfe.d1638d74.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[5277],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>h});var a=n(7294);function i(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function r(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function o(e){for(var t=1;t=0||(i[n]=e[n]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(i[n]=e[n])}return i}var s=a.createContext({}),c=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},p=function(e){var t=c(e.components);return a.createElement(s.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},u=a.forwardRef((function(e,t){var n=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),d=c(n),u=i,h=d["".concat(s,".").concat(u)]||d[u]||m[u]||r;return n?a.createElement(h,o(o({ref:t},p),{},{components:n})):a.createElement(h,o({ref:t},p))}));function h(e,t){var n=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=n.length,o=new Array(r);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[d]="string"==typeof e?e:i,o[1]=l;for(var c=2;c{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var a=n(7462),i=(n(7294),n(3905));const r={title:"Gene Fusion Detection"},o=void 0,l={unversionedId:"core-functionality/gene-fusions",id:"core-functionality/gene-fusions",title:"Gene Fusion Detection",description:"Overview",source:"@site/docs/core-functionality/gene-fusions.md",sourceDirName:"core-functionality",slug:"/core-functionality/gene-fusions",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/core-functionality/gene-fusions.md",tags:[],version:"current",frontMatter:{title:"Gene Fusion Detection"},sidebar:"docs",previous:{title:"Canonical Transcripts",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts"},next:{title:"ISCN Notation",permalink:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/iscn-notation"}},s=[{value:"Overview",id:"overview",children:[],level:2},{value:"Approach",id:"approach",children:[{value:"Variant Types",id:"variant-types",children:[],level:3},{value:"Criteria",id:"criteria",children:[],level:3}],level:2},{value:"ETV6/RUNX1 Example",id:"etv6runx1-example",children:[{value:"VCF",id:"vcf",children:[],level:3},{value:"JSON Output",id:"json-output",children:[{value:"Gene Fusion Data Sources",id:"gene-fusion-data-sources",children:[],level:4},{value:"Consequences",id:"consequences",children:[],level:4},{value:"Gene Fusions Section",id:"gene-fusions-section",children:[],level:4}],level:3}],level:2}],c={toc:s},p="wrapper";function d(e){let{components:t,...r}=e;return(0,i.kt)(p,(0,a.Z)({},c,r,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed."),(0,i.kt)("p",null,"Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations."),(0,i.kt)("p",null,"The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(6851).Z})),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. ",(0,i.kt)("a",{parentName:"p",href:"https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-015-0252-1"},"Landscape of gene fusions in epithelial cancers: seq and ye shall find"),". Genome Med 7, 129 (2015)"))),(0,i.kt)("h2",{id:"approach"},"Approach"),(0,i.kt)("p",null,"Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_014206.3")," (",(0,i.kt)("strong",{parentName:"p"},"TMEM258"),") and ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_013402.4")," (",(0,i.kt)("strong",{parentName:"p"},"FADS1"),"). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 transcripts",src:n(7309).Z})),(0,i.kt)("p",null,"The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:"),(0,i.kt)("p",null,(0,i.kt)("img",{alt:"TMEM258 & FADS1 gene fusions",src:n(2434).Z})),(0,i.kt)("p",null,"Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion.\nIf only unidirectional gene fusions are desired, only these two fusions can be detected. If ",(0,i.kt)("inlineCode",{parentName:"p"},"enable-bidirectional-fusions")," is enabled, all four cases can be identified."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Interpreting translocation breakends")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the ",(0,i.kt)("a",{parentName:"p",href:"https://samtools.github.io/hts-specs/VCFv4.2.pdf"},"VCF 4.2 specification"),"."),(0,i.kt)("table",{parentName:"div"},(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"REF"),(0,i.kt)("th",{parentName:"tr",align:"left"},"ALT"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Meaning"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t[p["),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the right of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"t]p]"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending left of p is joined after t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"]p]t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"piece extending to the left of p is joined before t")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"s"),(0,i.kt)("td",{parentName:"tr",align:"left"},"[p[t"),(0,i.kt)("td",{parentName:"tr",align:"left"},"reverse comp piece extending right of p is joined before t")))))),(0,i.kt)("h3",{id:"variant-types"},"Variant Types"),(0,i.kt)("p",null,"Specifically we can identify gene fusions from the following structural variant types:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"deletions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"tandem_duplications (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"inversions (",(0,i.kt)("inlineCode",{parentName:"li"},""),")"),(0,i.kt)("li",{parentName:"ul"},"translocation breakpoints (",(0,i.kt)("inlineCode",{parentName:"li"},"AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911["),") ")),(0,i.kt)("h3",{id:"criteria"},"Criteria"),(0,i.kt)("p",null,"The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},"After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is not enabled. They can have the same or different orientations if ",(0,i.kt)("inlineCode",{parentName:"li"},"enable-bidirectional-fusions")," is set."),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts must belong to different genes"),(0,i.kt)("li",{parentName:"ol"},"Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)")),(0,i.kt)("h2",{id:"etv6runx1-example"},"ETV6/RUNX1 Example"),(0,i.kt)("p",null,"ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Sun C., Chang L., Zhu X. ",(0,i.kt)("a",{parentName:"p",href:"https://www.oncotarget.com/article/16367/text/"},"Pathogenesis of ETV6/RUNX1-positive childhood acute lymphoblastic leukemia and mechanisms underlying its relapse"),". Oncotarget. 2017; 8: 35445-35459"))),(0,i.kt)("h3",{id:"vcf"},"VCF"),(0,i.kt)("p",null,"Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"##fileformat=VCFv4.1\n#CHROM POS ID REF ALT QUAL FILTER INFO\nchr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND\nchr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND\nchr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND\nchr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND\n")),(0,i.kt)("p",null,"When you put these calls together, the resulting genomic rearrangement looks something like this:"),(0,i.kt)("p",null,(0,i.kt)("img",{src:n(3299).Z})),(0,i.kt)("h3",{id:"json-output"},"JSON Output"),(0,i.kt)("p",null,"The annotation for the first variant in the VCF looks like this:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{36-58}","{36-58}":!0},'{"positions":[\n{\n "chromosome": "12",\n "position": 12026270,\n "refAllele": "C",\n "altAlleles": [\n "[chr21:36420865[C"\n ],\n "filters": [\n "PASS"\n ],\n "cytogeneticBand": "12p13.2",\n "variants": [\n {\n "vid": "12-12026270-C-[chr21:36420865[C",\n "chromosome": "12",\n "begin": 12026270,\n "end": 12026270,\n "isStructuralVariant": true,\n "refAllele": "C",\n "altAllele": "[chr21:36420865[C",\n "variantType": "translocation",\n "transcripts": [\n {\n "transcript": "ENST00000396373.4",\n "source": "Ensembl",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "ENSG00000139083",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "ENST00000437180.1",\n "bioType": "mRNA",\n "source": "Ensembl",\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000409227.1",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n },\n {\n "transcript": "ENST00000300305.3",\n "bioType": "mRNA",\n "source": "Ensembl",\n "isCanonical": true,\n "geneId": "ENSG00000159216",\n "proteinId": "ENSP00000300305.3",\n "intron": 1,\n "hgnc": "RUNX1",\n "hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "ENSP00000379658.3"\n },\n {\n "transcript": "NM_001987.5",\n "source": "RefSeq",\n "bioType": "mRNA",\n "introns": "5/7",\n "geneId": "2120",\n "hgnc": "ETV6",\n "consequence": [\n "transcript_variant",\n "unidirectional_gene_fusion"\n ],\n "impact": "modifier",\n "geneFusions": [\n {\n "transcript": "NM_001754.5",\n "bioType": "mRNA",\n "source": "RefSeq",\n "isCanonical": true,\n "geneId": "861",\n "proteinId": "NP_001745.2",\n "intron": 2,\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",\n "directionality": "unidirectional"\n }\n ],\n "isCanonical": true,\n "proteinId": "NP_001978.1"\n }\n ]\n }\n ]\n}\n]}\n\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"transcript ID")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"bioType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"descriptions of the ",(0,i.kt)("a",{parentName:"td",href:"https://uswest.ensembl.org/info/genome/genebuild/biotypes.html"},"biotypes from Ensembl"))),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"exon"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"exon that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"intron"),(0,i.kt)("td",{parentName:"tr",align:"center"},"int"),(0,i.kt)("td",{parentName:"tr",align:"left"},"intron that contained fusion breakpoint")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"geneId"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene ID. e.g. ENSG00000116062")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"gene symbol. e.g. MSH6")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hgvsr"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"HGVS RNA nomenclature")))),(0,i.kt)("h4",{id:"gene-fusion-data-sources"},"Gene Fusion Data Sources"),(0,i.kt)("p",null,"To provide more context to our gene fusions, we provide the following gene fusion data sources:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/cosmic"},"COSMIC")),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"../data-sources/fusioncatcher"},"FusionCatcher"))),(0,i.kt)("h4",{id:"consequences"},"Consequences"),(0,i.kt)("p",null,"When a gene fusion is identified, we add the following Sequence Ontology consequence:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{3}","{3}":!0},' "consequence": [\n "transcript_variant",\n "gene_fusion"\n ],\n')),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"If both transcripts have the same orientation, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"unidirectional_gene_fusion"),", if they have different orientations, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"bidirectional_gene_fusion")),(0,i.kt)("li",{parentName:"ul"},"If both unidirectional and bidirectional ones are detected, we label it as ",(0,i.kt)("inlineCode",{parentName:"li"},"gene_fusion"),".")),(0,i.kt)("h4",{id:"gene-fusions-section"},"Gene Fusions Section"),(0,i.kt)("p",null,"The ",(0,i.kt)("inlineCode",{parentName:"p"},"geneFusions")," section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ",(0,i.kt)("inlineCode",{parentName:"p"},"ENST00000396373.4"),", there 7 other Ensembl transcripts that would produce a gene fusion. For ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4"),", there was only one transcript (",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4"),") that produce a gene fusion."),(0,i.kt)("p",null,"For each originating transcript, we report the following for each partner transcript:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"transcript ID"),(0,i.kt)("li",{parentName:"ul"},"gene ID"),(0,i.kt)("li",{parentName:"ul"},"HGNC gene symbol"),(0,i.kt)("li",{parentName:"ul"},"transcript bio type (e.g. protein_coding)"),(0,i.kt)("li",{parentName:"ul"},"intron or exon number containing the breakpoint"),(0,i.kt)("li",{parentName:"ul"},"HGVS RNA notation"),(0,i.kt)("li",{parentName:"ul"},"gene fusion directionality")),(0,i.kt)("div",{className:"admonition admonition-tip alert alert--success"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"12",height:"16",viewBox:"0 0 12 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"}))),"tip")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see ",(0,i.kt)("a",{parentName:"p",href:"https://varnomen.hgvs.org/bg-material/consultation/svd-wg007"},"HGVS SVD-WG007"),")."))),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json",metastring:"{8}","{8}":!0},' "geneFusions": [\n {\n "transcript": "NM_001754.4",\n "bioType": "protein_coding",\n "intron": 2,\n "geneId": "861",\n "hgnc": "RUNX1",\n "hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",\n "directionality":"uniDirectional"\n }\n ],\n')),(0,i.kt)("p",null,"The HGVS RNA notation above indicates that the gene fusion starts with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001754.4")," (RUNX1) until CDS position 58 and continues with ",(0,i.kt)("inlineCode",{parentName:"p"},"NM_001987.4")," (ETV6). ",(0,i.kt)("inlineCode",{parentName:"p"},"1009+3367")," indicates that the fusion occurred 3367 bp within intron 2."))}d.isMDXComponent=!0},2434:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_GeneFusions-e5e3758ea9d2c07d3591e3801b2bf7e3.svg"},7309:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/TMEM258_FADS1_Transcripts-fe1b9c6be1f7cbfefbce887f8cec5d58.svg"},3299:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/etv6-runx1-fusion-ec8f4312c9aca496bde0d6e2b1bbd50d.svg"},6851:(e,t,n)=>{n.d(t,{Z:()=>a});const a=n.p+"assets/images/gene-fusions-fig2-1cce8ac31b00465c8d36bdc47ec3309e.svg"}}]); \ No newline at end of file diff --git a/assets/js/ee4db9b8.d1757fec.js b/assets/js/ee4db9b8.d1757fec.js new file mode 100644 index 00000000..2d0d2138 --- /dev/null +++ b/assets/js/ee4db9b8.d1757fec.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3769,775,7520],{3905:(e,t,a)=>{a.d(t,{Zo:()=>d,kt:()=>h});var n=a(7294);function i(e,t,a){return t in e?Object.defineProperty(e,t,{value:a,enumerable:!0,configurable:!0,writable:!0}):e[t]=a,e}function r(e,t){var a=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),a.push.apply(a,n)}return a}function o(e){for(var t=1;t=0||(i[a]=e[a]);return i}(e,t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,a)&&(i[a]=e[a])}return i}var s=n.createContext({}),m=function(e){var t=n.useContext(s),a=t;return e&&(a="function"==typeof e?e(t):o(o({},t),e)),a},d=function(e){var t=m(e.components);return n.createElement(s.Provider,{value:t},e.children)},p="mdxType",c={inlineCode:"code",wrapper:function(e){var t=e.children;return n.createElement(n.Fragment,{},t)}},u=n.forwardRef((function(e,t){var a=e.components,i=e.mdxType,r=e.originalType,s=e.parentName,d=l(e,["components","mdxType","originalType","parentName"]),p=m(a),u=i,h=p["".concat(s,".").concat(u)]||p[u]||c[u]||r;return a?n.createElement(h,o(o({ref:t},d),{},{components:a})):n.createElement(h,o({ref:t},d))}));function h(e,t){var a=arguments,i=t&&t.mdxType;if("string"==typeof e||i){var r=a.length,o=new Array(r);o[0]=u;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[p]="string"==typeof e?e:i,o[1]=l;for(var m=2;m{a.r(t),a.d(t,{contentTitle:()=>o,default:()=>p,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var n=a(7462),i=(a(7294),a(3905));const r={},o=void 0,l={unversionedId:"data-sources/mitomap-small-variants-json",id:"version-3.24/data-sources/mitomap-small-variants-json",title:"mitomap-small-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-small-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-small-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-small-variants-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],m={toc:s},d="wrapper";function p(e){let{components:t,...a}=e;return(0,i.kt)(d,(0,n.Z)({},m,a,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "refAllele":"G",\n "altAllele":"A",\n "diseases":[ \n "Bipolar disorder",\n "Melanoma"\n ],\n "hasHomoplasmy":false,\n "hasHeteroplasmy":true,\n "status":"Reported",\n "clinicalSignificance":"confirmed pathogenic",\n "scorePercentile":83.30,\n "numGenBankFullLengthSeqs":2,\n "pubMedIds":["2316527","6299878","6301949"],\n "isAlleleSpecific":true\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"diseases"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"},"associated diseases")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hasHomoplasmy"),(0,i.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"hasHeteroplasmy"),(0,i.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"status"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"record status")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"clinicalSignificance"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"},"predicted pathogenicity")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"scorePercentile"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float"),(0,i.kt)("td",{parentName:"tr",align:"left"},"MitoTIP score")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"numGenBankFullLengthSeqs"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"},"# of GenBank full-length sequences")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,i.kt)("td",{parentName:"tr",align:"center"},"boolean"),(0,i.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the MITOMAP alternate allele")))))}p.isMDXComponent=!0},3623:(e,t,a)=>{a.r(t),a.d(t,{contentTitle:()=>o,default:()=>p,frontMatter:()=>r,metadata:()=>l,toc:()=>s});var n=a(7462),i=(a(7294),a(3905));const r={},o=void 0,l={unversionedId:"data-sources/mitomap-structural-variants-json",id:"version-3.24/data-sources/mitomap-structural-variants-json",title:"mitomap-structural-variants-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",sourceDirName:"data-sources",slug:"/data-sources/mitomap-structural-variants-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-structural-variants-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap-structural-variants-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],m={toc:s},d="wrapper";function p(e){let{components:t,...a}=e;return(0,i.kt)(d,(0,n.Z)({},m,a,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-json"},'"mitomap":[ \n { \n "chromosome":"MT",\n "begin":3166,\n "end":14152,\n "variantType":"deletion",\n "reciprocalOverlap":0.18068,\n "annotationOverlap":0.42405\n }\n]\n')),(0,i.kt)("table",null,(0,i.kt)("thead",{parentName:"table"},(0,i.kt)("tr",{parentName:"thead"},(0,i.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,i.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,i.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,i.kt)("tbody",{parentName:"table"},(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"chromosome"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"begin"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"end"),(0,i.kt)("td",{parentName:"tr",align:"center"},"integer"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,i.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,i.kt)("td",{parentName:"tr",align:"left"})),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"reciprocalOverlap"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")),(0,i.kt)("tr",{parentName:"tbody"},(0,i.kt)("td",{parentName:"tr",align:"left"},"annotationOverlap"),(0,i.kt)("td",{parentName:"tr",align:"center"},"float"),(0,i.kt)("td",{parentName:"tr",align:"left"},"Range: 0 - 1. Specified up to 5 decimal places")))))}p.isMDXComponent=!0},688:(e,t,a)=>{a.r(t),a.d(t,{contentTitle:()=>s,default:()=>u,frontMatter:()=>l,metadata:()=>m,toc:()=>d});var n=a(7462),i=(a(7294),a(3905)),r=a(3830),o=a(3623);const l={title:"MITOMAP"},s=void 0,m={unversionedId:"data-sources/mitomap",id:"version-3.24/data-sources/mitomap",title:"MITOMAP",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/mitomap.mdx",sourceDirName:"data-sources",slug:"/data-sources/mitomap",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/mitomap.mdx",tags:[],version:"3.24",frontMatter:{title:"MITOMAP"},sidebar:"docs",previous:{title:"Mitochondrial Heteroplasmy",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mito-heteroplasmy"},next:{title:"OMIM",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim"}},d=[{value:"Overview",id:"overview",children:[],level:2},{value:"Scraping HTML Pages",id:"scraping-html-pages",children:[{value:"Example",id:"example",children:[],level:3},{value:"Parsing",id:"parsing",children:[{value:"Allele Parsing",id:"allele-parsing",children:[],level:4}],level:3}],level:2},{value:"PostgreSQL Dump File",id:"postgresql-dump-file",children:[{value:"Example",id:"example-1",children:[],level:3},{value:"Parsing",id:"parsing-1",children:[],level:3}],level:2},{value:"Known Issues",id:"known-issues",children:[],level:2},{value:"Download URLs",id:"download-urls",children:[],level:2},{value:"JSON Output",id:"json-output",children:[{value:"Small Variants",id:"small-variants",children:[],level:3},{value:"Structural Variants",id:"structural-variants",children:[],level:3}],level:2}],p={toc:d},c="wrapper";function u(e){let{components:t,...l}=e;return(0,i.kt)(c,(0,n.Z)({},p,l,{components:t,mdxType:"MDXLayout"}),(0,i.kt)("h2",{id:"overview"},"Overview"),(0,i.kt)("p",null,"MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA."),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. ",(0,i.kt)("em",{parentName:"p"},"Current Protocols in Bioinformatics")," 1(123):1.23.1-26 (2013). ",(0,i.kt)("a",{parentName:"p",href:"http://www.mitomap.org"},"http://www.mitomap.org")))),(0,i.kt)("h2",{id:"scraping-html-pages"},"Scraping HTML Pages"),(0,i.kt)("h3",{id:"example"},"Example"),(0,i.kt)("p",null,"MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:"),(0,i.kt)("ol",null,(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/PolymorphismsControl"},"mtDNA Control Region Sequence Variants")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/PolymorphismsCoding"},"mtDNA Coding Region & RNA Sequence Variants")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/MutationsRNA"},"Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/MutationsCodingControl"},"Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/DeletionsSingle"},"Reported mtDNA Deletions")),(0,i.kt)("li",{parentName:"ol"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/foswiki/bin/view/MITOMAP/InsertionsSimple"},"mtDNA Simple Insertions"))),(0,i.kt)("p",null,(0,i.kt)("img",{src:a(5280).Z})),(0,i.kt)("h3",{id:"parsing"},"Parsing"),(0,i.kt)("p",null,"Here's what the HTML code looks like:"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-html"},"[\"582\",\"MT-TF\",\"Mitochondrial myopathy\",\"T582C\",\"tRNA Phe\",\"-\",\"+\",\"Reported\",\"72.90% \",\"0\",\"2\"],\n[\"583\",\"MT-TF\",\"MELAS / MM & EXIT\",\"G583A\",\"tRNA Phe\",\"-\",\"+\",\"Cfrm\",\"93.10% \",\"0\",\"3\"],\n")),(0,i.kt)("p",null,"We're mainly interested in the following columns (numbers indicate the HTML page above):"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"Position",(0,i.kt)("sup",null,"1,2,3,4")),(0,i.kt)("li",{parentName:"ul"},"Disease",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"Nucleotide Change",(0,i.kt)("sup",null,"1,2")),(0,i.kt)("li",{parentName:"ul"},"Allele",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"Homoplasmy",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"Heteroplasmy",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"Status",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"MitoTIP",(0,i.kt)("sup",null,"3,4")),(0,i.kt)("li",{parentName:"ul"},"GB Seqs FL(CR)",(0,i.kt)("sup",null,"1,2,3,4")),(0,i.kt)("li",{parentName:"ul"},"Deletion Junction",(0,i.kt)("sup",null,"5")),(0,i.kt)("li",{parentName:"ul"},"Insert (nt)",(0,i.kt)("sup",null,"6")),(0,i.kt)("li",{parentName:"ul"},"Insert Point (nt)",(0,i.kt)("sup",null,"6")),(0,i.kt)("li",{parentName:"ul"},"References/Curated References",(0,i.kt)("sup",null,"1,2,3,4"))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"MitoTIP")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"The MitoTIP information is used to populate the ",(0,i.kt)("inlineCode",{parentName:"p"},"clinicalSignificance")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"scorePercentile"),' JSON keys. The "frequency alert" entries are skipped since it\'s not directly relevant to clinical significance.'))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Left alignment")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions."))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Variant Enumeration")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are ",(0,i.kt)("inlineCode",{parentName:"p"},"C-C(2-8)")," and ",(0,i.kt)("inlineCode",{parentName:"p"},"A-AC or ACC"),". Alternate alleles containing IUPAC ambiguity codes are similarly enumerated."))),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Inversions")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"MITOMAP inversions are currently treated as MNVs."))),(0,i.kt)("h4",{id:"allele-parsing"},"Allele Parsing"),(0,i.kt)("p",null,"The following MITOMAP allele parsing conventions are supported:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"C123T"),(0,i.kt)("li",{parentName:"ul"},"16021_16022del"),(0,i.kt)("li",{parentName:"ul"},"8042del2"),(0,i.kt)("li",{parentName:"ul"},"C9537insC"),(0,i.kt)("li",{parentName:"ul"},"3902_3908invACCTTGC"),(0,i.kt)("li",{parentName:"ul"},"A-AC or ACC"),(0,i.kt)("li",{parentName:"ul"},"C-C(2-8)"),(0,i.kt)("li",{parentName:"ul"},"8042delAT")),(0,i.kt)("h2",{id:"postgresql-dump-file"},"PostgreSQL Dump File"),(0,i.kt)("h3",{id:"example-1"},"Example"),(0,i.kt)("pre",null,(0,i.kt)("code",{parentName:"pre",className:"language-scss"},"COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;\n1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177\n2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534\n")),(0,i.kt)("h3",{id:"parsing-1"},"Parsing"),(0,i.kt)("p",null,"From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"id"),(0,i.kt)("li",{parentName:"ul"},"nlmid")),(0,i.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Why not use the PostgreSQL file for everything?")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in."))),(0,i.kt)("h2",{id:"known-issues"},"Known Issues"),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Duplicated records")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown."),(0,i.kt)("ul",{parentName:"div"},(0,i.kt)("li",{parentName:"ul"},"For diseases and PubMed IDs, we take the union of the values in the duplicated records."),(0,i.kt)("li",{parentName:"ul"},"For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.")))),(0,i.kt)("div",{className:"admonition admonition-caution alert alert--warning"},(0,i.kt)("div",{parentName:"div",className:"admonition-heading"},(0,i.kt)("h5",{parentName:"div"},(0,i.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,i.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"16",height:"16",viewBox:"0 0 16 16"},(0,i.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"}))),"Skipped records")),(0,i.kt)("div",{parentName:"div",className:"admonition-content"},(0,i.kt)("p",{parentName:"div"},"Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped."))),(0,i.kt)("h2",{id:"download-urls"},"Download URLs"),(0,i.kt)("ul",null,(0,i.kt)("li",{parentName:"ul"},"see ",(0,i.kt)("a",{parentName:"li",href:"#example"},"HTML Pages")," above"),(0,i.kt)("li",{parentName:"ul"},(0,i.kt)("a",{parentName:"li",href:"https://mitomap.org/downloads/mitomap.dump.sql.gz"},"PostgreSQL dump file"))),(0,i.kt)("h2",{id:"json-output"},"JSON Output"),(0,i.kt)("h3",{id:"small-variants"},"Small Variants"),(0,i.kt)(r.default,{mdxType:"SmallJSON"}),(0,i.kt)("h3",{id:"structural-variants"},"Structural Variants"),(0,i.kt)(o.default,{mdxType:"SVJSON"}))}u.isMDXComponent=!0},5280:(e,t,a)=>{a.d(t,{Z:()=>n});const n=a.p+"assets/images/MITOMAP-d8d4dd35c2336fdba5fcced77ec438e6.png"}}]); \ No newline at end of file diff --git a/assets/js/ef4059aa.0328cea2.js b/assets/js/ef4059aa.0328cea2.js new file mode 100644 index 00000000..7eb9520b --- /dev/null +++ b/assets/js/ef4059aa.0328cea2.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3790],{3905:(t,n,e)=>{e.d(n,{Zo:()=>p,kt:()=>k});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var d=a.createContext({}),s=function(t){var n=a.useContext(d),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},p=function(t){var n=s(t.components);return a.createElement(d.Provider,{value:n},t.children)},u="mdxType",c={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},m=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,d=t.parentName,p=o(t,["components","mdxType","originalType","parentName"]),u=s(e),m=r,k=u["".concat(d,".").concat(m)]||u[m]||c[m]||l;return e?a.createElement(k,i(i({ref:n},p),{},{components:e})):a.createElement(k,i({ref:n},p))}));function k(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=m;var o={};for(var d in n)hasOwnProperty.call(n,d)&&(o[d]=n[d]);o.originalType=t,o[u]="string"==typeof t?t:r,i[1]=o;for(var s=2;s{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>o,toc:()=>d});var a=e(7462),r=(e(7294),e(3905));const l={id:"introduction",title:"Introduction",description:"translational research-grade variant annotation",hide_title:!0,slug:"/"},i=void 0,o={unversionedId:"introduction/introduction",id:"introduction/introduction",title:"Introduction",description:"translational research-grade variant annotation",source:"@site/docs/introduction/introduction.mdx",sourceDirName:"introduction",slug:"/",permalink:"/IlluminaConnectedAnnotationsDocumentation/",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/introduction/introduction.mdx",tags:[],version:"current",frontMatter:{id:"introduction",title:"Introduction",description:"translational research-grade variant annotation",hide_title:!0,slug:"/"},sidebar:"docs",next:{title:"Licensed Content",permalink:"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent"}},d=[{value:"What does Illumina Connected Annotations annotate?",id:"what-does-illumina-connected-annotations-annotate",children:[],level:2},{value:"Download",id:"download",children:[],level:2}],s={toc:d},p="wrapper";function u(t){let{components:n,...l}=t;return(0,r.kt)(p,(0,a.Z)({},s,l,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(7951).Z})),(0,r.kt)("p",null,"Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation."),(0,r.kt)("p",null,"The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease."),(0,r.kt)("p",null,"The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily."),(0,r.kt)("h2",{id:"what-does-illumina-connected-annotations-annotate"},"What does Illumina Connected Annotations annotate?"),(0,r.kt)("p",null,"We use Sequence Ontology consequences to describe how each variant impacts a given transcript:"),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(8812).Z})),(0,r.kt)("p",null,"The transcript and gene models are obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/"},"RefSeq")," and ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ensembl.org/pub/"},"Ensembl"),".\nThe current officially supported versions are:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Version"),(0,r.kt)("th",{parentName:"tr",align:null},"Release Date"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"RefSeq"),(0,r.kt)("td",{parentName:"tr",align:null},"GCF_000001405.40-RS_2023_03"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-03-21")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl"),(0,r.kt)("td",{parentName:"tr",align:null},"110"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-04-27")))),(0,r.kt)("p",null,"In addition, it uses external data sources to provide additional context for each variant.\nIllumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic.\nThe basic tier can be accessed free of charge. The professional tier requires a license. Please see ",(0,r.kt)("a",{parentName:"p",href:"./introduction/licensedContent"},"Licensed Content")," for details. For access, please contact ",(0,r.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Availability"),(0,r.kt)("th",{parentName:"tr",align:null},"Latest Supported Version"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"COSMIC"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"99")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"OMIM"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Primate AI-3D"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Splice AI"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.3")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"1000 Genomes Project"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"Phase 3 v3plus")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Cancer Hotspots"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"2017")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinGen"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinVar"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20231230")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DANN"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"dbSNP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"156")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DECIPHER"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"201509")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"FusionCatcher"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"1.33")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GERP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20110522")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GME Variome"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20160618")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gnomAD"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"3.1.2")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MITOMAP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200819")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MultiZ 100 way"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20171006")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"REVEL"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"freeze 5")))),(0,r.kt)("h2",{id:"download"},"Download"),(0,r.kt)("p",null,"Please visit ",(0,r.kt)("a",{parentName:"p",href:"https://developer.illumina.com/illumina-connected-annotations"},"Illumina Connected Annotations"),"."))}u.isMDXComponent=!0},7951:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/ICAnnotations-966475fab8adae0519d1667d592ad4b2.png"},8812:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/TranscriptConsequences-60ca1c43a36dacf896fecdabf09ce02c.svg"}}]); \ No newline at end of file diff --git a/assets/js/ef4059aa.87eebb08.js b/assets/js/ef4059aa.87eebb08.js deleted file mode 100644 index 9188dfe7..00000000 --- a/assets/js/ef4059aa.87eebb08.js +++ /dev/null @@ -1 +0,0 @@ -"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[3790],{3905:(t,n,e)=>{e.d(n,{Zo:()=>p,kt:()=>k});var a=e(7294);function r(t,n,e){return n in t?Object.defineProperty(t,n,{value:e,enumerable:!0,configurable:!0,writable:!0}):t[n]=e,t}function l(t,n){var e=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);n&&(a=a.filter((function(n){return Object.getOwnPropertyDescriptor(t,n).enumerable}))),e.push.apply(e,a)}return e}function i(t){for(var n=1;n=0||(r[e]=t[e]);return r}(t,n);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,e)&&(r[e]=t[e])}return r}var d=a.createContext({}),s=function(t){var n=a.useContext(d),e=n;return t&&(e="function"==typeof t?t(n):i(i({},n),t)),e},p=function(t){var n=s(t.components);return a.createElement(d.Provider,{value:n},t.children)},u="mdxType",c={inlineCode:"code",wrapper:function(t){var n=t.children;return a.createElement(a.Fragment,{},n)}},m=a.forwardRef((function(t,n){var e=t.components,r=t.mdxType,l=t.originalType,d=t.parentName,p=o(t,["components","mdxType","originalType","parentName"]),u=s(e),m=r,k=u["".concat(d,".").concat(m)]||u[m]||c[m]||l;return e?a.createElement(k,i(i({ref:n},p),{},{components:e})):a.createElement(k,i({ref:n},p))}));function k(t,n){var e=arguments,r=n&&n.mdxType;if("string"==typeof t||r){var l=e.length,i=new Array(l);i[0]=m;var o={};for(var d in n)hasOwnProperty.call(n,d)&&(o[d]=n[d]);o.originalType=t,o[u]="string"==typeof t?t:r,i[1]=o;for(var s=2;s{e.r(n),e.d(n,{contentTitle:()=>i,default:()=>u,frontMatter:()=>l,metadata:()=>o,toc:()=>d});var a=e(7462),r=(e(7294),e(3905));const l={id:"introduction",title:"Introduction",description:"Clinical-grade variant annotation",hide_title:!0,slug:"/"},i=void 0,o={unversionedId:"introduction/introduction",id:"introduction/introduction",title:"Introduction",description:"Clinical-grade variant annotation",source:"@site/docs/introduction/introduction.mdx",sourceDirName:"introduction",slug:"/",permalink:"/IlluminaConnectedAnnotationsDocumentation/",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/docs/introduction/introduction.mdx",tags:[],version:"current",frontMatter:{id:"introduction",title:"Introduction",description:"Clinical-grade variant annotation",hide_title:!0,slug:"/"},sidebar:"docs",next:{title:"Licensed Content",permalink:"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent"}},d=[{value:"What does Illumina Connected Annotations annotate?",id:"what-does-illumina-connected-annotations-annotate",children:[],level:2},{value:"Download",id:"download",children:[],level:2}],s={toc:d},p="wrapper";function u(t){let{components:n,...l}=t;return(0,r.kt)(p,(0,a.Z)({},s,l,{components:n,mdxType:"MDXLayout"}),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(7951).Z})),(0,r.kt)("p",null,"Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation."),(0,r.kt)("p",null,"The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease."),(0,r.kt)("p",null,"The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily."),(0,r.kt)("h2",{id:"what-does-illumina-connected-annotations-annotate"},"What does Illumina Connected Annotations annotate?"),(0,r.kt)("p",null,"We use Sequence Ontology consequences to describe how each variant impacts a given transcript:"),(0,r.kt)("p",null,(0,r.kt)("img",{src:e(8812).Z})),(0,r.kt)("p",null,"The transcript and gene models are obtained from ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/"},"RefSeq")," and ",(0,r.kt)("a",{parentName:"p",href:"https://ftp.ensembl.org/pub/"},"Ensembl"),".\nThe current officially supported versions are:"),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Version"),(0,r.kt)("th",{parentName:"tr",align:null},"Release Date"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"RefSeq"),(0,r.kt)("td",{parentName:"tr",align:null},"GCF_000001405.40-RS_2023_03"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-03-21")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl"),(0,r.kt)("td",{parentName:"tr",align:null},"110"),(0,r.kt)("td",{parentName:"tr",align:null},"2023-04-27")))),(0,r.kt)("p",null,"In addition, it uses external data sources to provide additional context for each variant.\nIllumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic.\nThe basic tier can be accessed free of charge. The professional tier requires a license. Please see ",(0,r.kt)("a",{parentName:"p",href:"./introduction/licensedContent"},"Licensed Content")," for details. For access, please contact ",(0,r.kt)("a",{parentName:"p",href:"mailto:annotation_support@illumina.com"},"annotation_support@illumina.com"),"."),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Data Source"),(0,r.kt)("th",{parentName:"tr",align:null},"Availability"),(0,r.kt)("th",{parentName:"tr",align:null},"Latest Supported Version"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"COSMIC"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"99")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"OMIM"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Primate AI-3D"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Splice AI"),(0,r.kt)("td",{parentName:"tr",align:null},"Professional"),(0,r.kt)("td",{parentName:"tr",align:null},"1.3")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"1000 Genomes Project"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"Phase 3 v3plus")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"Cancer Hotspots"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"2017")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinGen"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20240110")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"ClinVar"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20231230")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DANN"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"dbSNP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"156")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"DECIPHER"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"201509")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"FusionCatcher"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"1.33")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GERP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20110522")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"GME Variome"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20160618")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"gnomAD"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"3.1.2")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MITOMAP"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200819")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"MultiZ 100 way"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20171006")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"REVEL"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"20200205")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"TOPMed"),(0,r.kt)("td",{parentName:"tr",align:null},"Basic"),(0,r.kt)("td",{parentName:"tr",align:null},"freeze 5")))),(0,r.kt)("h2",{id:"download"},"Download"),(0,r.kt)("p",null,"Please visit ",(0,r.kt)("a",{parentName:"p",href:"https://developer.illumina.com/illumina-connected-annotations"},"Illumina Connected Annotations"),"."))}u.isMDXComponent=!0},7951:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/ICAnnotations-966475fab8adae0519d1667d592ad4b2.png"},8812:(t,n,e)=>{e.d(n,{Z:()=>a});const a=e.p+"assets/images/TranscriptConsequences-60ca1c43a36dacf896fecdabf09ce02c.svg"}}]); \ No newline at end of file diff --git a/assets/js/ef5201ba.8389de3c.js b/assets/js/ef5201ba.8389de3c.js new file mode 100644 index 00000000..6810bfe5 --- /dev/null +++ b/assets/js/ef5201ba.8389de3c.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[178,2612],{3905:(e,t,n)=>{n.d(t,{Zo:()=>p,kt:()=>g});var a=n(7294);function r(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function o(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);t&&(a=a.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,a)}return n}function i(e){for(var t=1;t=0||(r[n]=e[n]);return r}(e,t);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(e);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(r[n]=e[n])}return r}var s=a.createContext({}),d=function(e){var t=a.useContext(s),n=t;return e&&(n="function"==typeof e?e(t):i(i({},t),e)),n},p=function(e){var t=d(e.components);return a.createElement(s.Provider,{value:t},e.children)},c="mdxType",u={inlineCode:"code",wrapper:function(e){var t=e.children;return a.createElement(a.Fragment,{},t)}},m=a.forwardRef((function(e,t){var n=e.components,r=e.mdxType,o=e.originalType,s=e.parentName,p=l(e,["components","mdxType","originalType","parentName"]),c=d(n),m=r,g=c["".concat(s,".").concat(m)]||c[m]||u[m]||o;return n?a.createElement(g,i(i({ref:t},p),{},{components:n})):a.createElement(g,i({ref:t},p))}));function g(e,t){var n=arguments,r=t&&t.mdxType;if("string"==typeof e||r){var o=n.length,i=new Array(o);i[0]=m;var l={};for(var s in t)hasOwnProperty.call(t,s)&&(l[s]=t[s]);l.originalType=e,l[c]="string"==typeof e?e:r,i[1]=l;for(var d=2;d{n.r(t),n.d(t,{contentTitle:()=>i,default:()=>c,frontMatter:()=>o,metadata:()=>l,toc:()=>s});var a=n(7462),r=(n(7294),n(3905));const o={},i=void 0,l={unversionedId:"data-sources/decipher-json",id:"version-3.24/data-sources/decipher-json",title:"decipher-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/decipher-json.md",sourceDirName:"data-sources",slug:"/data-sources/decipher-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/decipher-json.md",tags:[],version:"3.24",frontMatter:{}},s=[],d={toc:s},p="wrapper";function c(e){let{components:t,...n}=e;return(0,r.kt)(p,(0,a.Z)({},d,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"decipher":[\n {\n "chromosome":"1",\n "begin":13516,\n "end":91073,\n "numDeletions":27,\n "deletionFrequency":0.675,\n "numDuplications":27,\n "duplicationFrequency":0.675,\n "sampleSize":40,\n "reciprocalOverlap": 0.27555,\n "annotationOverlap": 0.5901\n }\n],\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"Ensembl-style chromosome names")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"1-based position")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"numDeletions"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"# of observed deletions")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"deletionFrequency"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"deletion frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"numDuplications"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"# of observed duplications")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"duplicationFrequency"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"duplication frequency")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sampleSize"),(0,r.kt)("td",{parentName:"tr",align:null},"int"),(0,r.kt)("td",{parentName:"tr",align:null},"total # of samples")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"annotationOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"float"),(0,r.kt)("td",{parentName:"tr",align:null},"Range: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap")))))}c.isMDXComponent=!0},4306:(e,t,n)=>{n.r(t),n.d(t,{contentTitle:()=>l,default:()=>u,frontMatter:()=>i,metadata:()=>s,toc:()=>d});var a=n(7462),r=(n(7294),n(3905)),o=n(7927);const i={title:"DECIPHER"},l=void 0,s={unversionedId:"data-sources/decipher",id:"version-3.24/data-sources/decipher",title:"DECIPHER",description:"Overview",source:"@site/versioned_docs/version-3.24/data-sources/decipher.mdx",sourceDirName:"data-sources",slug:"/data-sources/decipher",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/decipher.mdx",tags:[],version:"3.24",frontMatter:{title:"DECIPHER"},sidebar:"docs",previous:{title:"dbSNP",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp"},next:{title:"FusionCatcher",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher"}},d=[{value:"Overview",id:"overview",children:[{value:"TSV Extraction",id:"tsv-extraction",children:[{value:"Parsing",id:"parsing",children:[],level:4}],level:3}],level:2},{value:"Download URL",id:"download-url",children:[{value:"JSON output",id:"json-output",children:[],level:3}],level:2}],p={toc:d},c="wrapper";function u(e){let{components:t,...n}=e;return(0,r.kt)(c,(0,a.Z)({},p,n,{components:t,mdxType:"MDXLayout"}),(0,r.kt)("h2",{id:"overview"},"Overview"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://www.deciphergenomics.org/"},"DECIPHER")," (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants."),(0,r.kt)("p",null,"DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation."),(0,r.kt)("div",{className:"admonition admonition-info alert alert--info"},(0,r.kt)("div",{parentName:"div",className:"admonition-heading"},(0,r.kt)("h5",{parentName:"div"},(0,r.kt)("span",{parentName:"h5",className:"admonition-icon"},(0,r.kt)("svg",{parentName:"span",xmlns:"http://www.w3.org/2000/svg",width:"14",height:"16",viewBox:"0 0 14 16"},(0,r.kt)("path",{parentName:"svg",fillRule:"evenodd",d:"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"}))),"Publication")),(0,r.kt)("div",{parentName:"div",className:"admonition-content"},(0,r.kt)("p",{parentName:"div"},"DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)"))),(0,r.kt)("h3",{id:"tsv-extraction"},"TSV Extraction"),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-scss"},"#population_cnv_id chr start end deletion_observations deletion_frequency deletion_standard_error duplication_observations duplication_frequency duplication_standard_error observations frequency standard_error type sample_size study\n1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls\n2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls\n3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD\n")),(0,r.kt)("h4",{id:"parsing"},"Parsing"),(0,r.kt)("p",null,"We parse the DECIPHER tsv file and extract the following columns:"),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"chr"),(0,r.kt)("li",{parentName:"ul"},"start"),(0,r.kt)("li",{parentName:"ul"},"end"),(0,r.kt)("li",{parentName:"ul"},"deletion_observations"),(0,r.kt)("li",{parentName:"ul"},"deletion_frequency"),(0,r.kt)("li",{parentName:"ul"},"duplication_observations"),(0,r.kt)("li",{parentName:"ul"},"duplication_frequency"),(0,r.kt)("li",{parentName:"ul"},"sample_size")),(0,r.kt)("h2",{id:"download-url"},"Download URL"),(0,r.kt)("p",null,(0,r.kt)("a",{parentName:"p",href:"https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz"},"https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz"),"\n",(0,r.kt)("a",{parentName:"p",href:"https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz"},"https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz")),(0,r.kt)("h3",{id:"json-output"},"JSON output"),(0,r.kt)(o.default,{mdxType:"JSON"}))}u.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/f0d534fb.ee63e5e0.js b/assets/js/f0d534fb.ee63e5e0.js new file mode 100644 index 00000000..d3705085 --- /dev/null +++ b/assets/js/f0d534fb.ee63e5e0.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[1840],{3905:(t,e,n)=>{n.d(e,{Zo:()=>s,kt:()=>f});var r=n(7294);function a(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function o(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(t);e&&(r=r.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,r)}return n}function i(t){for(var e=1;e=0||(a[n]=t[n]);return a}(t,e);if(Object.getOwnPropertySymbols){var o=Object.getOwnPropertySymbols(t);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(a[n]=t[n])}return a}var l=r.createContext({}),p=function(t){var e=r.useContext(l),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},s=function(t){var e=p(t.components);return r.createElement(l.Provider,{value:e},t.children)},d="mdxType",m={inlineCode:"code",wrapper:function(t){var e=t.children;return r.createElement(r.Fragment,{},e)}},u=r.forwardRef((function(t,e){var n=t.components,a=t.mdxType,o=t.originalType,l=t.parentName,s=c(t,["components","mdxType","originalType","parentName"]),d=p(n),u=a,f=d["".concat(l,".").concat(u)]||d[u]||m[u]||o;return n?r.createElement(f,i(i({ref:e},s),{},{components:n})):r.createElement(f,i({ref:e},s))}));function f(t,e){var n=arguments,a=e&&e.mdxType;if("string"==typeof t||a){var o=n.length,i=new Array(o);i[0]=u;var c={};for(var l in e)hasOwnProperty.call(e,l)&&(c[l]=e[l]);c.originalType=t,c[d]="string"==typeof t?t:a,i[1]=c;for(var p=2;p{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>d,frontMatter:()=>o,metadata:()=>c,toc:()=>l});var r=n(7462),a=(n(7294),n(3905));const o={},i=void 0,c={unversionedId:"data-sources/splice-ai-json",id:"version-3.24/data-sources/splice-ai-json",title:"splice-ai-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/splice-ai-json.md",sourceDirName:"data-sources",slug:"/data-sources/splice-ai-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/splice-ai-json.md",tags:[],version:"3.24",frontMatter:{}},l=[],p={toc:l},s="wrapper";function d(t){let{components:e,...n}=t;return(0,a.kt)(s,(0,r.Z)({},p,n,{components:e,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"spliceAI":[ \n {\n "hgnc":"BLCAP",\n "acceptorGainDistance":-3,\n "acceptorGainScore":0.3,\n "donorLossDistance":7,\n "donorLossScore":0.9\n },\n { \n "hgnc":"NNAT",\n "acceptorGainDistance":-1,\n "acceptorGainScore":0.2,\n "donorGainDistance":-2,\n "donorGainScore":0.3\n }\n]\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,a.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,a.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"hgnc"),(0,a.kt)("td",{parentName:"tr",align:"center"},"string"),(0,a.kt)("td",{parentName:"tr",align:"left"},"HGNC gene symbol")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"acceptorGainDistance"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"acceptorGainScore"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"acceptorLossDistance"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"acceptorLossScore"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"donorGainDistance"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"donorGainScore"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"donorLossDistance"),(0,a.kt)("td",{parentName:"tr",align:"center"},"int"),(0,a.kt)("td",{parentName:"tr",align:"left"},"\xb1 bp from current position")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:"left"},"donorLossScore"),(0,a.kt)("td",{parentName:"tr",align:"center"},"float"),(0,a.kt)("td",{parentName:"tr",align:"left"},"range: 0 - 1.0. 1 decimal place")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/f6bbc271.22ed2407.js b/assets/js/f6bbc271.22ed2407.js new file mode 100644 index 00000000..dd9e8206 --- /dev/null +++ b/assets/js/f6bbc271.22ed2407.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[9050],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>g});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function o(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var p=a.createContext({}),u=function(t){var e=a.useContext(p),n=e;return t&&(n="function"==typeof t?t(e):o(o({},e),t)),n},m=function(t){var e=u(t.components);return a.createElement(p.Provider,{value:e},t.children)},s="mdxType",d={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},c=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,p=t.parentName,m=i(t,["components","mdxType","originalType","parentName"]),s=u(n),c=r,g=s["".concat(p,".").concat(c)]||s[c]||d[c]||l;return n?a.createElement(g,o(o({ref:e},m),{},{components:n})):a.createElement(g,o({ref:e},m))}));function g(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,o=new Array(l);o[0]=c;var i={};for(var p in e)hasOwnProperty.call(e,p)&&(i[p]=e[p]);i.originalType=t,i[s]="string"==typeof t?t:r,o[1]=i;for(var u=2;u{n.r(e),n.d(e,{contentTitle:()=>o,default:()=>s,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var a=n(7462),r=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/1000Genomes-sv-json",id:"version-3.24/data-sources/1000Genomes-sv-json",title:"1000Genomes-sv-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",sourceDirName:"data-sources",slug:"/data-sources/1000Genomes-sv-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-sv-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/1000Genomes-sv-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},m="wrapper";function s(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},u,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"oneKg":[\n {\n "chromosome":"1",\n "begin":1595369,\n "end":1612441,\n "variantType": "copy_number_variation",\n "id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",\n "allAn": 5008,\n "allAc": 2702,\n "allAf": 0.539537,\n "afrAf": 0.6052,\n "amrAf": 0.3675,\n "eurAf": 0.5357,\n "easAf": 0.5368,\n "sasAf": 0.5797,\n "reciprocalOverlap": 0.07555\n }\n],\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:null},"Field"),(0,r.kt)("th",{parentName:"tr",align:null},"Type"),(0,r.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"chromosome"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"begin"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"end"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"variantType"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"id"),(0,r.kt)("td",{parentName:"tr",align:null},"string"),(0,r.kt)("td",{parentName:"tr",align:null})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAn"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele number for all populations. Non-zero integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAc"),(0,r.kt)("td",{parentName:"tr",align:null},"integer"),(0,r.kt)("td",{parentName:"tr",align:null},"allele count for all populations. Integer.")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"allAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for all populations. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"afrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the African super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"amrAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the Ad Mixed American super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"eurAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the European super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"easAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the East Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"sasAf"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"allele frequency for the South Asian super population. Range: 0 - 1.0")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:null},"reciprocalOverlap"),(0,r.kt)("td",{parentName:"tr",align:null},"floating point"),(0,r.kt)("td",{parentName:"tr",align:null},"range: 0 - 1.")))))}s.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/f735f5cc.af7951fb.js b/assets/js/f735f5cc.af7951fb.js new file mode 100644 index 00000000..e7e30237 --- /dev/null +++ b/assets/js/f735f5cc.af7951fb.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[7283],{3905:(t,e,n)=>{n.d(e,{Zo:()=>m,kt:()=>k});var a=n(7294);function r(t,e,n){return e in t?Object.defineProperty(t,e,{value:n,enumerable:!0,configurable:!0,writable:!0}):t[e]=n,t}function l(t,e){var n=Object.keys(t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(t);e&&(a=a.filter((function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable}))),n.push.apply(n,a)}return n}function i(t){for(var e=1;e=0||(r[n]=t[n]);return r}(t,e);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(t);for(a=0;a=0||Object.prototype.propertyIsEnumerable.call(t,n)&&(r[n]=t[n])}return r}var o=a.createContext({}),s=function(t){var e=a.useContext(o),n=e;return t&&(n="function"==typeof t?t(e):i(i({},e),t)),n},m=function(t){var e=s(t.components);return a.createElement(o.Provider,{value:e},t.children)},d="mdxType",c={inlineCode:"code",wrapper:function(t){var e=t.children;return a.createElement(a.Fragment,{},e)}},u=a.forwardRef((function(t,e){var n=t.components,r=t.mdxType,l=t.originalType,o=t.parentName,m=p(t,["components","mdxType","originalType","parentName"]),d=s(n),u=r,k=d["".concat(o,".").concat(u)]||d[u]||c[u]||l;return n?a.createElement(k,i(i({ref:e},m),{},{components:n})):a.createElement(k,i({ref:e},m))}));function k(t,e){var n=arguments,r=e&&e.mdxType;if("string"==typeof t||r){var l=n.length,i=new Array(l);i[0]=u;var p={};for(var o in e)hasOwnProperty.call(e,o)&&(p[o]=e[o]);p.originalType=t,p[d]="string"==typeof t?t:r,i[1]=p;for(var s=2;s{n.r(e),n.d(e,{contentTitle:()=>i,default:()=>d,frontMatter:()=>l,metadata:()=>p,toc:()=>o});var a=n(7462),r=(n(7294),n(3905));const l={},i=void 0,p={unversionedId:"data-sources/clinvar-json",id:"version-3.24/data-sources/clinvar-json",title:"clinvar-json",description:"small variants:",source:"@site/versioned_docs/version-3.24/data-sources/clinvar-json.md",sourceDirName:"data-sources",slug:"/data-sources/clinvar-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/clinvar-json.md",tags:[],version:"3.24",frontMatter:{}},o=[],s={toc:o},m="wrapper";function d(t){let{components:e,...n}=t;return(0,r.kt)(m,(0,a.Z)({},s,n,{components:e,mdxType:"MDXLayout"}),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"small variants:")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "id":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "significance":[\n "benign"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "lastUpdatedDate":"2020-03-01",\n "isAlleleSpecific":true\n },\n {\n "id":"RCV000030258.4",\n "variationId":"VCV000036581.3",\n "reviewStatus":"reviewed by expert panel",\n "alleleOrigins":[\n "germline"\n ],\n "refAllele":"G",\n "altAllele":"A",\n "phenotypes":[\n "Lynch syndrome"\n ],\n "medGenIds":[\n "C1333990"\n ],\n "omimIds":[\n "120435"\n ],\n "significance":[\n "benign"\n ],\n "lastUpdatedDate":"2017-05-01",\n "isAlleleSpecific":true\n }\n]\n')),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"large variants:")),(0,r.kt)("pre",null,(0,r.kt)("code",{parentName:"pre",className:"language-json"},'"clinvar":[\n {\n "chromosome":"1", \n "begin":629025, \n "end":8537745, \n "variantType":"copy_number_loss", \n "id":"RCV000051993.4", \n "variationId":"VCV000058242.1", \n "reviewStatus":"criteria provided, single submitter", \n "alleleOrigins":[\n "not provided"\n ], \n "phenotypes":[\n "See cases"\n ], \n "significance":[\n "pathogenic"\n ], \n "lastUpdatedDate":"2022-04-21", \n "pubMedIds":[\n "21844811"\n ]\n },\n {\n "id":"VCV000058242.1",\n "reviewStatus":"criteria provided, single submitter",\n "significance":[\n "pathogenic"\n ],\n "lastUpdatedDate":"2022-04-21"\n },\n ......\n]\n')),(0,r.kt)("table",null,(0,r.kt)("thead",{parentName:"table"},(0,r.kt)("tr",{parentName:"thead"},(0,r.kt)("th",{parentName:"tr",align:"left"},"Field"),(0,r.kt)("th",{parentName:"tr",align:"center"},"Type"),(0,r.kt)("th",{parentName:"tr",align:"left"},"Notes"))),(0,r.kt)("tbody",{parentName:"table"},(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"id"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ClinVar ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variationId"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"ClinVar VCV ID")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"variantType"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"variant type")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"reviewStatus"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"alleleOrigins"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"refAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"altAllele"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"phenotypes"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"})),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"medGenIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"MedGen IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"omimIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"OMIM IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"orphanetIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"Orphanet IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"significance"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"see possible values below")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"lastUpdatedDate"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string"),(0,r.kt)("td",{parentName:"tr",align:"left"},"yyyy-MM-dd")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"pubMedIds"),(0,r.kt)("td",{parentName:"tr",align:"center"},"string array"),(0,r.kt)("td",{parentName:"tr",align:"left"},"PubMed IDs")),(0,r.kt)("tr",{parentName:"tbody"},(0,r.kt)("td",{parentName:"tr",align:"left"},"isAlleleSpecific"),(0,r.kt)("td",{parentName:"tr",align:"center"},"bool"),(0,r.kt)("td",{parentName:"tr",align:"left"},"true when the current variant alternate allele matches the ClinVar alternate allele")))),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"reviewStatus:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"no assertion provided"),(0,r.kt)("li",{parentName:"ul"},"no assertion criteria provided"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, single submitter"),(0,r.kt)("li",{parentName:"ul"},"practice guideline"),(0,r.kt)("li",{parentName:"ul"},"classified by multiple submitters"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, conflicting interpretations"),(0,r.kt)("li",{parentName:"ul"},"criteria provided, multiple submitters, no conflicts"),(0,r.kt)("li",{parentName:"ul"},"no interpretation for the single variant")),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"alleleOrigins:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"unknown"),(0,r.kt)("li",{parentName:"ul"},"other"),(0,r.kt)("li",{parentName:"ul"},"germline"),(0,r.kt)("li",{parentName:"ul"},"somatic"),(0,r.kt)("li",{parentName:"ul"},"inherited"),(0,r.kt)("li",{parentName:"ul"},"paternal"),(0,r.kt)("li",{parentName:"ul"},"maternal"),(0,r.kt)("li",{parentName:"ul"},"de-novo"),(0,r.kt)("li",{parentName:"ul"},"biparental"),(0,r.kt)("li",{parentName:"ul"},"uniparental"),(0,r.kt)("li",{parentName:"ul"},"not-tested"),(0,r.kt)("li",{parentName:"ul"},"tested-inconclusive")),(0,r.kt)("p",null,(0,r.kt)("strong",{parentName:"p"},"significance:")),(0,r.kt)("ul",null,(0,r.kt)("li",{parentName:"ul"},"uncertain significance"),(0,r.kt)("li",{parentName:"ul"},"not provided"),(0,r.kt)("li",{parentName:"ul"},"benign"),(0,r.kt)("li",{parentName:"ul"},"likely benign"),(0,r.kt)("li",{parentName:"ul"},"likely pathogenic"),(0,r.kt)("li",{parentName:"ul"},"pathogenic"),(0,r.kt)("li",{parentName:"ul"},"drug response"),(0,r.kt)("li",{parentName:"ul"},"histocompatibility"),(0,r.kt)("li",{parentName:"ul"},"association"),(0,r.kt)("li",{parentName:"ul"},"risk factor"),(0,r.kt)("li",{parentName:"ul"},"protective"),(0,r.kt)("li",{parentName:"ul"},"affects"),(0,r.kt)("li",{parentName:"ul"},"conflicting data from submitters"),(0,r.kt)("li",{parentName:"ul"},"other"),(0,r.kt)("li",{parentName:"ul"},"no interpretation for the single variant"),(0,r.kt)("li",{parentName:"ul"},"conflicting interpretations of pathogenicity")))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/ff654592.322d662d.js b/assets/js/ff654592.322d662d.js new file mode 100644 index 00000000..f4c3da77 --- /dev/null +++ b/assets/js/ff654592.322d662d.js @@ -0,0 +1 @@ +"use strict";(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[4335],{3905:(e,t,n)=>{n.d(t,{Zo:()=>c,kt:()=>f});var r=n(7294);function a(e,t,n){return t in e?Object.defineProperty(e,t,{value:n,enumerable:!0,configurable:!0,writable:!0}):e[t]=n,e}function l(e,t){var n=Object.keys(e);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);t&&(r=r.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),n.push.apply(n,r)}return n}function o(e){for(var t=1;t=0||(a[n]=e[n]);return a}(e,t);if(Object.getOwnPropertySymbols){var l=Object.getOwnPropertySymbols(e);for(r=0;r=0||Object.prototype.propertyIsEnumerable.call(e,n)&&(a[n]=e[n])}return a}var p=r.createContext({}),u=function(e){var t=r.useContext(p),n=t;return e&&(n="function"==typeof e?e(t):o(o({},t),e)),n},c=function(e){var t=u(e.components);return r.createElement(p.Provider,{value:t},e.children)},d="mdxType",m={inlineCode:"code",wrapper:function(e){var t=e.children;return r.createElement(r.Fragment,{},t)}},s=r.forwardRef((function(e,t){var n=e.components,a=e.mdxType,l=e.originalType,p=e.parentName,c=i(e,["components","mdxType","originalType","parentName"]),d=u(n),s=a,f=d["".concat(p,".").concat(s)]||d[s]||m[s]||l;return n?r.createElement(f,o(o({ref:t},c),{},{components:n})):r.createElement(f,o({ref:t},c))}));function f(e,t){var n=arguments,a=t&&t.mdxType;if("string"==typeof e||a){var l=n.length,o=new Array(l);o[0]=s;var i={};for(var p in t)hasOwnProperty.call(t,p)&&(i[p]=t[p]);i.originalType=e,i[d]="string"==typeof e?e:a,o[1]=i;for(var u=2;u{n.r(t),n.d(t,{contentTitle:()=>o,default:()=>d,frontMatter:()=>l,metadata:()=>i,toc:()=>p});var r=n(7462),a=(n(7294),n(3905));const l={},o=void 0,i={unversionedId:"data-sources/topmed-json",id:"version-3.24/data-sources/topmed-json",title:"topmed-json",description:"| Field | Type | Notes |",source:"@site/versioned_docs/version-3.24/data-sources/topmed-json.md",sourceDirName:"data-sources",slug:"/data-sources/topmed-json",permalink:"/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed-json",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/versioned_docs/version-3.24/data-sources/topmed-json.md",tags:[],version:"3.24",frontMatter:{}},p=[],u={toc:p},c="wrapper";function d(e){let{components:t,...n}=e;return(0,a.kt)(c,(0,r.Z)({},u,n,{components:t,mdxType:"MDXLayout"}),(0,a.kt)("pre",null,(0,a.kt)("code",{parentName:"pre",className:"language-json"},'"topmed":{ \n "allAc":20,\n "allAn":125568,\n "allAf":0.000159,\n "allHc":0,\n "failedFilter":true\n}\n')),(0,a.kt)("table",null,(0,a.kt)("thead",{parentName:"table"},(0,a.kt)("tr",{parentName:"thead"},(0,a.kt)("th",{parentName:"tr",align:null},"Field"),(0,a.kt)("th",{parentName:"tr",align:null},"Type"),(0,a.kt)("th",{parentName:"tr",align:null},"Notes"))),(0,a.kt)("tbody",{parentName:"table"},(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAc"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"TOPMed allele count")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAn"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"TOPMed allele number. Non-zero integer.")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allAf"),(0,a.kt)("td",{parentName:"tr",align:null},"float"),(0,a.kt)("td",{parentName:"tr",align:null},"TOPMed allele frequency (computed by Illumina Connected Annotations)")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"allHc"),(0,a.kt)("td",{parentName:"tr",align:null},"int"),(0,a.kt)("td",{parentName:"tr",align:null},"TOPMed homozygous count")),(0,a.kt)("tr",{parentName:"tbody"},(0,a.kt)("td",{parentName:"tr",align:null},"failedFilter"),(0,a.kt)("td",{parentName:"tr",align:null},"bool"),(0,a.kt)("td",{parentName:"tr",align:null},"True if this variant failed any filters")))))}d.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/main.23b43cc6.js b/assets/js/main.23b43cc6.js deleted file mode 100644 index 2ff46368..00000000 --- a/assets/js/main.23b43cc6.js +++ /dev/null @@ -1,2 +0,0 @@ -/*! For license information please see main.23b43cc6.js.LICENSE.txt */ -(self.webpackChunknirvana_documentation=self.webpackChunknirvana_documentation||[]).push([[179],{830:(e,n,t)=>{"use strict";t.d(n,{W:()=>a});var o=t(7294);function a(){return o.createElement("svg",{width:"20",height:"20",className:"DocSearch-Search-Icon",viewBox:"0 0 20 20"},o.createElement("path",{d:"M14.386 14.386l4.0877 4.0877-4.0877-4.0877c-2.9418 2.9419-7.7115 2.9419-10.6533 0-2.9419-2.9418-2.9419-7.7115 0-10.6533 2.9418-2.9419 7.7115-2.9419 10.6533 0 2.9419 2.9418 2.9419 7.7115 0 10.6533z",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinecap:"round",strokeLinejoin:"round"}))}},9782:(e,n,t)=>{"use strict";t.r(n),t.d(n,{default:()=>o});const o={title:"IlluminaConnectedAnnotations",tagline:"Translational researcy-grade variant annotation",url:"https://illumina.github.io",baseUrl:"/IlluminaConnectedAnnotationsDocumentation/",onBrokenLinks:"throw",favicon:"img/favicon.ico",organizationName:"illumina",projectName:"IlluminaConnectedAnnotationsDocumentation",themeConfig:{gtag:{trackingID:"G-5KXNW9LCD7"},algolia:{apiKey:"e908c17192dca08b01d9d994b576335b",indexName:"illumina_nirvana",contextualSearch:!0,appId:"BH4D9OD16A",searchParameters:{}},colorMode:{defaultMode:"light",disableSwitch:!0,respectPrefersColorScheme:!1,switchConfig:{darkIcon:"\ud83c\udf1c",darkIconStyle:{},lightIcon:"\ud83c\udf1e",lightIconStyle:{}}},navbar:{logo:{src:"img/ICAnnotations.png"},items:[{type:"docsVersionDropdown",position:"right",dropdownActiveClassDisabled:!0,dropdownItemsAfter:[{to:"/versions",label:"All versions"}],dropdownItemsBefore:[]}],hideOnScroll:!1},footer:{style:"dark",copyright:"\xa9 2024 Illumina, Inc. All rights reserved.",links:[]},docs:{versionPersistence:"localStorage"},metadata:[],prism:{additionalLanguages:[]},hideableSidebar:!1,tableOfContents:{minHeadingLevel:2,maxHeadingLevel:3}},stylesheets:["https://fonts.googleapis.com/css2?family=Open+Sans&family=Raleway&family=Source+Code+Pro&display=swap"],presets:[["@docusaurus/preset-classic",{docs:{routeBasePath:"/",sidebarPath:"/Users/rroy1/IlluminaConnectedAnnotationsDocumentation/sidebars.js",editUrl:"https://github.com/Illumina/IlluminaConnectedAnnotationsDocumentation/edit/master/",lastVersion:"current",versions:{current:{label:"3.24 (unreleased)"}}},theme:{customCss:"/Users/rroy1/IlluminaConnectedAnnotationsDocumentation/src/css/custom.css"}}]],baseUrlIssueBanner:!0,i18n:{defaultLocale:"en",locales:["en"],localeConfigs:{}},onBrokenMarkdownLinks:"warn",onDuplicateRoutes:"warn",staticDirectories:["static"],customFields:{},plugins:[],themes:[],titleDelimiter:"|",noIndex:!1}},2067:(e,n,t)=>{"use strict";var o=t(7294),a=t(3935),r=t(3727),i=t(8356),s=t.n(i);function c(e){let{error:n,retry:t,pastDelay:a}=e;return n?o.createElement("div",{style:{align:"center",color:"#fff",backgroundColor:"#fa383e",borderColor:"#fa383e",borderStyle:"solid",borderRadius:"0.25rem",borderWidth:"1px",boxSizing:"border-box",display:"block",padding:"1rem",flex:"0 0 50%",marginLeft:"25%",marginRight:"25%",marginTop:"5rem",maxWidth:"50%",width:"100%"}},o.createElement("p",null,n.message),o.createElement("div",null,o.createElement("button",{type:"button",onClick:t},"Retry"))):a?o.createElement("div",{style:{display:"flex",justifyContent:"center",alignItems:"center",height:"100vh"}},o.createElement("svg",{id:"loader",style:{width:128,height:110,position:"absolute",top:"calc(100vh - 64%)"},viewBox:"0 0 45 45",xmlns:"http://www.w3.org/2000/svg",stroke:"#61dafb"},o.createElement("g",{fill:"none",fillRule:"evenodd",transform:"translate(1 1)",strokeWidth:"2"},o.createElement("circle",{cx:"22",cy:"22",r:"6",strokeOpacity:"0"},o.createElement("animate",{attributeName:"r",begin:"1.5s",dur:"3s",values:"6;22",calcMode:"linear",repeatCount:"indefinite"}),o.createElement("animate",{attributeName:"stroke-opacity",begin:"1.5s",dur:"3s",values:"1;0",calcMode:"linear",repeatCount:"indefinite"}),o.createElement("animate",{attributeName:"stroke-width",begin:"1.5s",dur:"3s",values:"2;0",calcMode:"linear",repeatCount:"indefinite"})),o.createElement("circle",{cx:"22",cy:"22",r:"6",strokeOpacity:"0"},o.createElement("animate",{attributeName:"r",begin:"3s",dur:"3s",values:"6;22",calcMode:"linear",repeatCount:"indefinite"}),o.createElement("animate",{attributeName:"stroke-opacity",begin:"3s",dur:"3s",values:"1;0",calcMode:"linear",repeatCount:"indefinite"}),o.createElement("animate",{attributeName:"stroke-width",begin:"3s",dur:"3s",values:"2;0",calcMode:"linear",repeatCount:"indefinite"})),o.createElement("circle",{cx:"22",cy:"22",r:"8"},o.createElement("animate",{attributeName:"r",begin:"0s",dur:"1.5s",values:"6;1;2;3;4;5;6",calcMode:"linear",repeatCount:"indefinite"}))))):null}const l=JSON.parse('{"/IlluminaConnectedAnnotationsDocumentation/blog/archive-192":{"component":"9e4087bc","archive":"7674fa56"},"/IlluminaConnectedAnnotationsDocumentation/search-158":{"component":"45e4bd3d"},"/IlluminaConnectedAnnotationsDocumentation/versions-4b9":{"component":"18b93cb3","config":"5e9f5e1a"},"/IlluminaConnectedAnnotationsDocumentation/3.22-267":{"component":"1be78505","versionMetadata":"916efb7e"},"/IlluminaConnectedAnnotationsDocumentation/3.22/-f61":{"component":"17896441","content":"7bd03d56"},"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcripts-f08":{"component":"17896441","content":"70ef2029"},"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusions-ef1":{"component":"17896441","content":"4516865f"},"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impacts-93e":{"component":"17896441","content":"aeee51c8"},"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-ids-d01":{"component":"17896441","content":"3597d407"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-a9b":{"component":"17896441","content":"c3c2a1f1"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-json-66d":{"component":"17896441","content":"03fa4b14"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-json-86e":{"component":"17896441","content":"fb0d881d"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-254":{"component":"17896441","content":"864a7df9"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-json-3dd":{"component":"17896441","content":"5b71e24d"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspots-156":{"component":"17896441","content":"bcdb388a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-8c9":{"component":"17896441","content":"6ffe2549"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-json-3de":{"component":"17896441","content":"dfa01370"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-json-0d4":{"component":"17896441","content":"9737b5e1"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-json-370":{"component":"17896441","content":"46676406"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-992":{"component":"17896441","content":"ea1a2647"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-json-969":{"component":"17896441","content":"936eec31"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-d16":{"component":"17896441","content":"69943b9d"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-census-da3":{"component":"17896441","content":"905a57e0"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-json-2be":{"component":"17896441","content":"f42ca355"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-json-e5f":{"component":"17896441","content":"3654c673"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-e25":{"component":"17896441","content":"8ca849dc"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-json-ae5":{"component":"17896441","content":"e1c0dc4a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-487":{"component":"17896441","content":"0d07779f"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-json-4ef":{"component":"17896441","content":"6783d6d3"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-769":{"component":"17896441","content":"63871b40"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-json-9c2":{"component":"17896441","content":"188e275c"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-194":{"component":"17896441","content":"b7b6e5d7"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-json-282":{"component":"17896441","content":"97754c84"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-bb2":{"component":"17896441","content":"38aa46c2"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-json-7ee":{"component":"17896441","content":"1508cb7b"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-504":{"component":"17896441","content":"39f1c452"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-json-2ef":{"component":"17896441","content":"14f34967"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-906":{"component":"17896441","content":"bebe1a3a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-json-492":{"component":"17896441","content":"4016b43c"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-json-0cd":{"component":"17896441","content":"6d7786c3"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_description-94d":{"component":"17896441","content":"16234d45"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-json-c06":{"component":"17896441","content":"98b05c7a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmy-5d6":{"component":"17896441","content":"f0cfb972"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-fb6":{"component":"17896441","content":"aa90c840"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-json-901":{"component":"17896441","content":"8ee30fb3"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-json-01d":{"component":"17896441","content":"add85258"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-48d":{"component":"17896441","content":"28a086b2"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-json-800":{"component":"17896441","content":"dbc89f8d"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-83d":{"component":"17896441","content":"40d835fa"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-json-e4c":{"component":"17896441","content":"05633c72"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-630":{"component":"17896441","content":"766e4ec0"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-json-7ef":{"component":"17896441","content":"5241723c"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-ad5":{"component":"17896441","content":"8ca6b7fc"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-json-6d8":{"component":"17896441","content":"b406875a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-ac6":{"component":"17896441","content":"973fe7cb"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-json-3a8":{"component":"17896441","content":"763a725a"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-076":{"component":"17896441","content":"d4cb531b"},"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-json-e0d":{"component":"17896441","content":"b38f1ddb"},"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotations-4bb":{"component":"17896441","content":"49d3eb79"},"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-format-f91":{"component":"17896441","content":"791ec758"},"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependencies-4b5":{"component":"17896441","content":"28133a87"},"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-started-241":{"component":"17896441","content":"88ae8085"},"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-json-fcb":{"component":"17896441","content":"cbe769db"},"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasix-63f":{"component":"17896441","content":"1b498237"},"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautils-7d9":{"component":"17896441","content":"b97c1ec5"},"/IlluminaConnectedAnnotationsDocumentation/3.23-256":{"component":"1be78505","versionMetadata":"1eb2ff76"},"/IlluminaConnectedAnnotationsDocumentation/3.23/-879":{"component":"17896441","content":"482fe60d"},"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcripts-6d3":{"component":"17896441","content":"766044f3"},"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusions-c81":{"component":"17896441","content":"bf53e43c"},"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impacts-5bc":{"component":"17896441","content":"70c1bead"},"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-ids-aa7":{"component":"17896441","content":"8b37abaa"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-b8e":{"component":"17896441","content":"187a5e61"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-json-270":{"component":"17896441","content":"37610296"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-json-68e":{"component":"17896441","content":"3ea0d8a7"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-b73":{"component":"17896441","content":"39b3af82"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-json-af0":{"component":"17896441","content":"4820c9bb"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspots-24e":{"component":"17896441","content":"427b5cb0"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-891":{"component":"17896441","content":"591ce630"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-json-958":{"component":"17896441","content":"4a6a71fe"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-json-adb":{"component":"17896441","content":"7df63534"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-json-e6c":{"component":"17896441","content":"5c115048"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-023":{"component":"17896441","content":"a792da87"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-json-098":{"component":"17896441","content":"dd91fa1e"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-3c5":{"component":"17896441","content":"343b0cd4"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-census-3b7":{"component":"17896441","content":"4c9d989c"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-json-b72":{"component":"17896441","content":"693f6a53"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-json-122":{"component":"17896441","content":"8a76d82e"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-f51":{"component":"17896441","content":"774a9e44"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-json-c27":{"component":"17896441","content":"204d540e"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-b1e":{"component":"17896441","content":"c5a697ac"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-json-356":{"component":"17896441","content":"6cf00a18"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-b67":{"component":"17896441","content":"2b34f5ab"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-json-0fe":{"component":"17896441","content":"3f688870"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-79a":{"component":"17896441","content":"7a38426a"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-json-bba":{"component":"17896441","content":"6edb4115"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-883":{"component":"17896441","content":"2423b99c"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-json-f03":{"component":"17896441","content":"374d2bd3"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-293":{"component":"17896441","content":"6c2bb9f5"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-json-d55":{"component":"17896441","content":"294e718e"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-6fb":{"component":"17896441","content":"a049baa7"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-json-270":{"component":"17896441","content":"a665ff54"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-json-5f8":{"component":"17896441","content":"fcc450d8"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_description-14b":{"component":"17896441","content":"7593e9f9"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-json-587":{"component":"17896441","content":"6b5aeb61"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmy-ac9":{"component":"17896441","content":"5a0956a8"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-525":{"component":"17896441","content":"3e93a0dc"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-json-b38":{"component":"17896441","content":"ca4cc287"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-json-035":{"component":"17896441","content":"500e4eca"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-4b9":{"component":"17896441","content":"4c6cded2"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-json-7dd":{"component":"17896441","content":"c95142d3"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-b6b":{"component":"17896441","content":"807bb480"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-json-62c":{"component":"17896441","content":"82834dbd"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-json-2a9":{"component":"17896441","content":"86133f4f"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-a6a":{"component":"17896441","content":"160992a6"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-json-cfa":{"component":"17896441","content":"0a1081a6"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-14c":{"component":"17896441","content":"60fb7ae7"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-json-705":{"component":"17896441","content":"93bfed11"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-f95":{"component":"17896441","content":"2fef8196"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-json-c2f":{"component":"17896441","content":"54974e0f"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-f47":{"component":"17896441","content":"39f86048"},"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-json-d42":{"component":"17896441","content":"83b88aab"},"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotations-a03":{"component":"17896441","content":"e2baf76c"},"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-format-115":{"component":"17896441","content":"407d1113"},"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependencies-cd8":{"component":"17896441","content":"a41b3c06"},"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-started-01c":{"component":"17896441","content":"1875b224"},"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContent-60c":{"component":"17896441","content":"5ddfc72f"},"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-json-8d6":{"component":"17896441","content":"946595d9"},"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasix-bc9":{"component":"17896441","content":"8f87ea60"},"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautils-062":{"component":"17896441","content":"d757867a"},"/IlluminaConnectedAnnotationsDocumentation/-2e7":{"component":"1be78505","versionMetadata":"935f2afb"},"/IlluminaConnectedAnnotationsDocumentation/-0a5":{"component":"17896441","content":"ef4059aa"},"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts-0f3":{"component":"17896441","content":"463e69e4"},"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions-9a5":{"component":"17896441","content":"e95cadfe"},"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving-494":{"component":"17896441","content":"4b69b274"},"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts-572":{"component":"17896441","content":"75a3a2eb"},"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids-8d0":{"component":"17896441","content":"a5e136a1"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-6a1":{"component":"17896441","content":"a9ecceb6"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-json-c39":{"component":"17896441","content":"9620026c"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-json-4fd":{"component":"17896441","content":"440d17b3"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-d35":{"component":"17896441","content":"7b3bfa5e"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-json-8bf":{"component":"17896441","content":"a8504dcf"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots-b95":{"component":"17896441","content":"9a946f68"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-fa4":{"component":"17896441","content":"771fd362"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-json-58e":{"component":"17896441","content":"abda0f14"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-json-547":{"component":"17896441","content":"82e726f2"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-json-286":{"component":"17896441","content":"b4210c11"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-bed":{"component":"17896441","content":"cd35fae7"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-json-d9e":{"component":"17896441","content":"7bc16216"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-611":{"component":"17896441","content":"d247ca0b"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-json-b47":{"component":"17896441","content":"cd820d6d"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-0af":{"component":"17896441","content":"08a089c6"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-census-bb8":{"component":"17896441","content":"666ea911"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-json-094":{"component":"17896441","content":"4397ec05"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-json-cbc":{"component":"17896441","content":"b6dcd8b7"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-a22":{"component":"17896441","content":"988d0ae8"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-json-5bd":{"component":"17896441","content":"57cffed1"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-bd6":{"component":"17896441","content":"18946b76"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-json-f1d":{"component":"17896441","content":"a8da062f"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-569":{"component":"17896441","content":"5dd9300a"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-json-f45":{"component":"17896441","content":"0be5de6c"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-4e3":{"component":"17896441","content":"cd0802b4"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-json-774":{"component":"17896441","content":"601929e3"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-03f":{"component":"17896441","content":"f262a5f6"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-json-c31":{"component":"17896441","content":"539175fb"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-95e":{"component":"17896441","content":"07bac56e"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-json-45b":{"component":"17896441","content":"f98a4229"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-6b3":{"component":"17896441","content":"98bbf06c"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-json-249":{"component":"17896441","content":"833bd66e"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-json-3d5":{"component":"17896441","content":"8ae16000"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_description-1e7":{"component":"17896441","content":"85047af6"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-json-57f":{"component":"17896441","content":"e39dd739"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-json-d8a":{"component":"17896441","content":"9a14a8bf"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-json-f88":{"component":"17896441","content":"4fc9223b"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-json-29b":{"component":"17896441","content":"63aab588"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy-068":{"component":"17896441","content":"5d1e2784"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-540":{"component":"17896441","content":"5d851e34"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-json-3d1":{"component":"17896441","content":"0bd2af6a"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-json-00d":{"component":"17896441","content":"494b7fcc"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-7c3":{"component":"17896441","content":"5b7bb28d"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-json-83f":{"component":"17896441","content":"644aa76c"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-3ef":{"component":"17896441","content":"a26ba82d"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-json-98b":{"component":"17896441","content":"a2ab8500"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-json-bc2":{"component":"17896441","content":"6a83c684"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-fc4":{"component":"17896441","content":"915fca76"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-json-3e2":{"component":"17896441","content":"34e55124"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-172":{"component":"17896441","content":"b51ccab7"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-json-997":{"component":"17896441","content":"42c73b29"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-7de":{"component":"17896441","content":"ba2982bf"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-json-1d4":{"component":"17896441","content":"191d3c1c"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-ea7":{"component":"17896441","content":"51ec9460"},"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-json-014":{"component":"17896441","content":"cd8220b1"},"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations-4b4":{"component":"17896441","content":"e286457f"},"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format-76b":{"component":"17896441","content":"b2e466e8"},"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format-40f":{"component":"17896441","content":"fdf7d659"},"/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update-743":{"component":"17896441","content":"8c5c59ee"},"/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies-1c5":{"component":"17896441","content":"f7e8c160"},"/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started-ab0":{"component":"17896441","content":"f048ed9e"},"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent-af3":{"component":"17896441","content":"8b7fbb05"},"/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json-289":{"component":"17896441","content":"e1e7c361"},"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix-b46":{"component":"17896441","content":"eef24e02"},"/IlluminaConnectedAnnotationsDocumentation/utilities/sautils-b75":{"component":"17896441","content":"2973af85"}}'),u={"03fa4b14":[()=>t.e(9220).then(t.bind(t,4474)),"@site/versioned_docs/version-3.22/data-sources/1000Genomes-snv-json.md",4474],"05633c72":[()=>t.e(5230).then(t.bind(t,258)),"@site/versioned_docs/version-3.22/data-sources/phylop-json.md",258],"07bac56e":[()=>t.e(1342).then(t.bind(t,2812)),"@site/docs/data-sources/gme.mdx",2812],"08a089c6":[()=>t.e(3957).then(t.bind(t,1335)),"@site/docs/data-sources/cosmic.mdx",1335],"0a1081a6":[()=>t.e(9570).then(t.bind(t,9909)),"@site/versioned_docs/version-3.23/data-sources/primate-ai-json.md",9909],"0bd2af6a":[()=>t.e(5160).then(t.bind(t,8181)),"@site/docs/data-sources/mitomap-small-variants-json.md",8181],"0be5de6c":[()=>t.e(1912).then(t.bind(t,4072)),"@site/docs/data-sources/decipher-json.md",4072],"0d07779f":[()=>t.e(2571).then(t.bind(t,7256)),"@site/versioned_docs/version-3.22/data-sources/dbsnp.mdx",7256],"14f34967":[()=>t.e(3834).then(t.bind(t,3074)),"@site/versioned_docs/version-3.22/data-sources/gme-json.md",3074],"1508cb7b":[()=>t.e(9831).then(t.bind(t,1410)),"@site/versioned_docs/version-3.22/data-sources/gerp-json.md",1410],"160992a6":[()=>t.e(7051).then(t.bind(t,6571)),"@site/versioned_docs/version-3.23/data-sources/primate-ai.mdx",6571],"16234d45":[()=>t.e(7184).then(t.bind(t,211)),"@site/versioned_docs/version-3.22/data-sources/gnomad-structural-variants-data_description.md",211],17896441:[()=>Promise.all([t.e(532),t.e(7918)]).then(t.bind(t,2319)),"@theme/DocItem",2319],"1875b224":[()=>t.e(1323).then(t.bind(t,9035)),"@site/versioned_docs/version-3.23/introduction/getting-started.md",9035],"187a5e61":[()=>t.e(1872).then(t.bind(t,6214)),"@site/versioned_docs/version-3.23/data-sources/1000Genomes.mdx",6214],"188e275c":[()=>t.e(8534).then(t.bind(t,6413)),"@site/versioned_docs/version-3.22/data-sources/decipher-json.md",6413],"18946b76":[()=>t.e(3305).then(t.bind(t,4266)),"@site/docs/data-sources/dbsnp.mdx",4266],"18b93cb3":[()=>t.e(3042).then(t.bind(t,351)),"@site/src/pages/versions.js",351],"191d3c1c":[()=>t.e(4899).then(t.bind(t,9838)),"@site/docs/data-sources/splice-ai-json.md",9838],"1b498237":[()=>t.e(5386).then(t.bind(t,6627)),"@site/versioned_docs/version-3.22/utilities/jasix.mdx",6627],"1be78505":[()=>Promise.all([t.e(532),t.e(9514)]).then(t.bind(t,3042)),"@theme/DocPage",3042],"1eb2ff76":[()=>t.e(379).then(t.t.bind(t,9151,19)),"~docs/default/version-3-23-metadata-prop-de7.json",9151],"204d540e":[()=>t.e(9850).then(t.bind(t,8798)),"@site/versioned_docs/version-3.23/data-sources/dann-json.md",8798],"2423b99c":[()=>t.e(5112).then(t.bind(t,9204)),"@site/versioned_docs/version-3.23/data-sources/gerp.mdx",9204],"28133a87":[()=>t.e(4890).then(t.bind(t,2844)),"@site/versioned_docs/version-3.22/introduction/dependencies.md",2844],"28a086b2":[()=>t.e(3431).then(t.bind(t,1218)),"@site/versioned_docs/version-3.22/data-sources/omim.mdx",1218],"294e718e":[()=>t.e(1133).then(t.bind(t,7979)),"@site/versioned_docs/version-3.23/data-sources/gme-json.md",7979],"2973af85":[()=>t.e(5111).then(t.bind(t,224)),"@site/docs/utilities/sautils.mdx",224],"2b34f5ab":[()=>t.e(7635).then(t.bind(t,7786)),"@site/versioned_docs/version-3.23/data-sources/decipher.mdx",7786],"2fef8196":[()=>t.e(8255).then(t.bind(t,2656)),"@site/versioned_docs/version-3.23/data-sources/splice-ai.mdx",2656],"343b0cd4":[()=>t.e(699).then(t.bind(t,3021)),"@site/versioned_docs/version-3.23/data-sources/cosmic.mdx",3021],"34e55124":[()=>t.e(7942).then(t.bind(t,737)),"@site/docs/data-sources/primate-ai-json.md",737],"3597d407":[()=>t.e(9252).then(t.bind(t,502)),"@site/versioned_docs/version-3.22/core-functionality/variant-ids.md",502],"3654c673":[()=>t.e(8183).then(t.bind(t,4586)),"@site/versioned_docs/version-3.22/data-sources/cosmic-json.md",4586],"374d2bd3":[()=>t.e(423).then(t.bind(t,123)),"@site/versioned_docs/version-3.23/data-sources/gerp-json.md",123],37610296:[()=>t.e(3804).then(t.bind(t,134)),"@site/versioned_docs/version-3.23/data-sources/1000Genomes-snv-json.md",134],"38aa46c2":[()=>t.e(607).then(t.bind(t,4734)),"@site/versioned_docs/version-3.22/data-sources/gerp.mdx",4734],"39b3af82":[()=>t.e(5861).then(t.bind(t,6906)),"@site/versioned_docs/version-3.23/data-sources/amino-acid-conservation.mdx",6906],"39f1c452":[()=>t.e(8286).then(t.bind(t,1722)),"@site/versioned_docs/version-3.22/data-sources/gme.mdx",1722],"39f86048":[()=>t.e(5468).then(t.bind(t,7504)),"@site/versioned_docs/version-3.23/data-sources/topmed.mdx",7504],"3e93a0dc":[()=>t.e(6137).then(t.bind(t,7021)),"@site/versioned_docs/version-3.23/data-sources/mitomap.mdx",7021],"3ea0d8a7":[()=>t.e(864).then(t.bind(t,1529)),"@site/versioned_docs/version-3.23/data-sources/1000Genomes-sv-json.md",1529],"3f688870":[()=>t.e(7440).then(t.bind(t,8115)),"@site/versioned_docs/version-3.23/data-sources/decipher-json.md",8115],"4016b43c":[()=>t.e(1660).then(t.bind(t,977)),"@site/versioned_docs/version-3.22/data-sources/gnomad-lof-json.md",977],"407d1113":[()=>t.e(6611).then(t.bind(t,6161)),"@site/versioned_docs/version-3.23/file-formats/illumina-annotator-json-file-format.mdx",6161],"40d835fa":[()=>t.e(8327).then(t.bind(t,7940)),"@site/versioned_docs/version-3.22/data-sources/phylop.mdx",7940],"427b5cb0":[()=>t.e(676).then(t.bind(t,4727)),"@site/versioned_docs/version-3.23/data-sources/cancer-hotspots.mdx",4727],"42c73b29":[()=>t.e(2508).then(t.bind(t,591)),"@site/docs/data-sources/revel-json.md",591],"4397ec05":[()=>t.e(5360).then(t.bind(t,7997)),"@site/docs/data-sources/cosmic-gene-fusion-json.md",7997],"440d17b3":[()=>t.e(4648).then(t.bind(t,2590)),"@site/docs/data-sources/1000Genomes-sv-json.md",2590],"4516865f":[()=>t.e(3556).then(t.bind(t,9122)),"@site/versioned_docs/version-3.22/core-functionality/gene-fusions.md",9122],"45e4bd3d":[()=>Promise.all([t.e(532),t.e(1722)]).then(t.bind(t,9172)),"/Users/rroy1/IlluminaConnectedAnnotationsDocumentation/node_modules/@docusaurus/theme-search-algolia/lib/theme/SearchPage/index.js",9172],"463e69e4":[()=>t.e(7278).then(t.bind(t,1027)),"@site/docs/core-functionality/canonical-transcripts.md",1027],46676406:[()=>t.e(8679).then(t.bind(t,770)),"@site/versioned_docs/version-3.22/data-sources/clingen-json.md",770],"4820c9bb":[()=>t.e(5849).then(t.bind(t,6521)),"@site/versioned_docs/version-3.23/data-sources/amino-acid-conservation-json.md",6521],"482fe60d":[()=>t.e(4169).then(t.bind(t,1798)),"@site/versioned_docs/version-3.23/introduction/introduction.mdx",1798],"494b7fcc":[()=>t.e(8462).then(t.bind(t,8898)),"@site/docs/data-sources/mitomap-structural-variants-json.md",8898],"49d3eb79":[()=>t.e(5979).then(t.bind(t,1908)),"@site/versioned_docs/version-3.22/file-formats/custom-annotations.md",1908],"4a6a71fe":[()=>t.e(486).then(t.bind(t,7598)),"@site/versioned_docs/version-3.23/data-sources/clingen-dosage-json.md",7598],"4b69b274":[()=>t.e(7106).then(t.bind(t,2691)),"@site/docs/core-functionality/junction-preserving.md",2691],"4c6cded2":[()=>t.e(2190).then(t.bind(t,8348)),"@site/versioned_docs/version-3.23/data-sources/omim.mdx",8348],"4c9d989c":[()=>t.e(8334).then(t.bind(t,9617)),"@site/versioned_docs/version-3.23/data-sources/cosmic-cancer-gene-census.md",9617],"4fc9223b":[()=>t.e(1160).then(t.bind(t,1200)),"@site/docs/data-sources/gnomad4.0-small-variants-json.md",1200],"500e4eca":[()=>t.e(3907).then(t.bind(t,3105)),"@site/versioned_docs/version-3.23/data-sources/mitomap-structural-variants-json.md",3105],"51ec9460":[()=>t.e(5697).then(t.bind(t,6891)),"@site/docs/data-sources/topmed.mdx",6891],"5241723c":[()=>t.e(3308).then(t.bind(t,1966)),"@site/versioned_docs/version-3.22/data-sources/primate-ai-json.md",1966],"539175fb":[()=>t.e(5702).then(t.bind(t,5538)),"@site/docs/data-sources/gerp-json.md",5538],"54974e0f":[()=>t.e(6882).then(t.bind(t,9722)),"@site/versioned_docs/version-3.23/data-sources/splice-ai-json.md",9722],"57cffed1":[()=>t.e(6192).then(t.bind(t,540)),"@site/docs/data-sources/dann-json.md",540],"591ce630":[()=>t.e(9411).then(t.bind(t,1092)),"@site/versioned_docs/version-3.23/data-sources/clingen.mdx",1092],"5a0956a8":[()=>t.e(6518).then(t.bind(t,4864)),"@site/versioned_docs/version-3.23/data-sources/mito-heteroplasmy.md",4864],"5b71e24d":[()=>t.e(1575).then(t.bind(t,6809)),"@site/versioned_docs/version-3.22/data-sources/amino-acid-conservation-json.md",6809],"5b7bb28d":[()=>t.e(8943).then(t.bind(t,1927)),"@site/docs/data-sources/omim.mdx",1927],"5c115048":[()=>t.e(434).then(t.bind(t,6850)),"@site/versioned_docs/version-3.23/data-sources/clingen-json.md",6850],"5d1e2784":[()=>t.e(1311).then(t.bind(t,6762)),"@site/docs/data-sources/mito-heteroplasmy.md",6762],"5d851e34":[()=>t.e(7795).then(t.bind(t,7763)),"@site/docs/data-sources/mitomap.mdx",7763],"5dd9300a":[()=>t.e(8907).then(t.bind(t,1389)),"@site/docs/data-sources/decipher.mdx",1389],"5ddfc72f":[()=>t.e(1495).then(t.bind(t,384)),"@site/versioned_docs/version-3.23/introduction/licensedContent.mdx",384],"5e9f5e1a":[()=>Promise.resolve().then(t.bind(t,9782)),"@generated/docusaurus.config",9782],"601929e3":[()=>t.e(1266).then(t.bind(t,8202)),"@site/docs/data-sources/fusioncatcher-json.md",8202],"60fb7ae7":[()=>t.e(3873).then(t.bind(t,7661)),"@site/versioned_docs/version-3.23/data-sources/revel.mdx",7661],"63871b40":[()=>t.e(2092).then(t.bind(t,3917)),"@site/versioned_docs/version-3.22/data-sources/decipher.mdx",3917],"63aab588":[()=>t.e(595).then(t.bind(t,992)),"@site/docs/data-sources/gnomad40-structural-variants-json.md",992],"644aa76c":[()=>t.e(216).then(t.bind(t,8010)),"@site/docs/data-sources/omim-json.md",8010],"666ea911":[()=>t.e(6635).then(t.bind(t,1273)),"@site/docs/data-sources/cosmic-cancer-gene-census.md",1273],"6783d6d3":[()=>t.e(7836).then(t.bind(t,5737)),"@site/versioned_docs/version-3.22/data-sources/dbsnp-json.md",5737],"693f6a53":[()=>t.e(2590).then(t.bind(t,2646)),"@site/versioned_docs/version-3.23/data-sources/cosmic-gene-fusion-json.md",2646],"69943b9d":[()=>t.e(7826).then(t.bind(t,8069)),"@site/versioned_docs/version-3.22/data-sources/cosmic.mdx",8069],"6a83c684":[()=>t.e(8034).then(t.bind(t,7054)),"@site/docs/data-sources/phylopprimate-json.md",7054],"6b5aeb61":[()=>t.e(9299).then(t.bind(t,9337)),"@site/versioned_docs/version-3.23/data-sources/gnomad-structural-variants-json.md",9337],"6c2bb9f5":[()=>t.e(1877).then(t.bind(t,7656)),"@site/versioned_docs/version-3.23/data-sources/gme.mdx",7656],"6cf00a18":[()=>t.e(7647).then(t.bind(t,7271)),"@site/versioned_docs/version-3.23/data-sources/dbsnp-json.md",7271],"6d7786c3":[()=>t.e(2099).then(t.bind(t,3969)),"@site/versioned_docs/version-3.22/data-sources/gnomad-small-variants-json.md",3969],"6edb4115":[()=>t.e(6170).then(t.bind(t,2713)),"@site/versioned_docs/version-3.23/data-sources/fusioncatcher-json.md",2713],"6ffe2549":[()=>t.e(7413).then(t.bind(t,4401)),"@site/versioned_docs/version-3.22/data-sources/clingen.mdx",4401],"70c1bead":[()=>t.e(5284).then(t.bind(t,7898)),"@site/versioned_docs/version-3.23/core-functionality/transcript-consequence-impacts.md",7898],"70ef2029":[()=>t.e(8553).then(t.bind(t,7177)),"@site/versioned_docs/version-3.22/core-functionality/canonical-transcripts.md",7177],"7593e9f9":[()=>t.e(2219).then(t.bind(t,2600)),"@site/versioned_docs/version-3.23/data-sources/gnomad-structural-variants-data_description.md",2600],"75a3a2eb":[()=>t.e(9767).then(t.bind(t,1062)),"@site/docs/core-functionality/transcript-consequence-impacts.md",1062],"763a725a":[()=>t.e(3491).then(t.bind(t,5371)),"@site/versioned_docs/version-3.22/data-sources/splice-ai-json.md",5371],"766044f3":[()=>t.e(8103).then(t.bind(t,3277)),"@site/versioned_docs/version-3.23/core-functionality/canonical-transcripts.md",3277],"766e4ec0":[()=>t.e(1053).then(t.bind(t,6522)),"@site/versioned_docs/version-3.22/data-sources/primate-ai.mdx",6522],"7674fa56":[()=>t.e(975).then(t.t.bind(t,3982,19)),"~blog/default/illumina-connected-annotations-documentation-blog-archive-009.json",3982],"771fd362":[()=>t.e(7850).then(t.bind(t,599)),"@site/docs/data-sources/clingen.mdx",599],"774a9e44":[()=>t.e(5335).then(t.bind(t,7774)),"@site/versioned_docs/version-3.23/data-sources/dann.mdx",7774],"791ec758":[()=>t.e(3563).then(t.bind(t,5522)),"@site/versioned_docs/version-3.22/file-formats/illumina-annotator-json-file-format.mdx",5522],"7a38426a":[()=>t.e(6820).then(t.bind(t,1632)),"@site/versioned_docs/version-3.23/data-sources/fusioncatcher.mdx",1632],"7b3bfa5e":[()=>t.e(3389).then(t.bind(t,1877)),"@site/docs/data-sources/amino-acid-conservation.mdx",1877],"7bc16216":[()=>t.e(3232).then(t.bind(t,212)),"@site/docs/data-sources/clinvar-json.md",212],"7bd03d56":[()=>t.e(3437).then(t.bind(t,8634)),"@site/versioned_docs/version-3.22/introduction/introduction.mdx",8634],"7df63534":[()=>t.e(4043).then(t.bind(t,6968)),"@site/versioned_docs/version-3.23/data-sources/clingen-gene-validity-json.md",6968],"807bb480":[()=>t.e(4797).then(t.bind(t,1130)),"@site/versioned_docs/version-3.23/data-sources/phylop.mdx",1130],"82834dbd":[()=>t.e(8740).then(t.bind(t,718)),"@site/versioned_docs/version-3.23/data-sources/phylop-json.md",718],"82e726f2":[()=>t.e(12).then(t.bind(t,949)),"@site/docs/data-sources/clingen-gene-validity-json.md",949],"833bd66e":[()=>t.e(9082).then(t.bind(t,4859)),"@site/docs/data-sources/gnomad-lof-json.md",4859],"83b88aab":[()=>t.e(9009).then(t.bind(t,8042)),"@site/versioned_docs/version-3.23/data-sources/topmed-json.md",8042],"85047af6":[()=>t.e(7860).then(t.bind(t,6335)),"@site/docs/data-sources/gnomad-structural-variants-data_description.md",6335],"86133f4f":[()=>t.e(2386).then(t.bind(t,6796)),"@site/versioned_docs/version-3.23/data-sources/phylopprimate-json.md",6796],"864a7df9":[()=>t.e(7696).then(t.bind(t,590)),"@site/versioned_docs/version-3.22/data-sources/amino-acid-conservation.mdx",590],"88ae8085":[()=>t.e(4775).then(t.bind(t,8366)),"@site/versioned_docs/version-3.22/introduction/getting-started.md",8366],"8a76d82e":[()=>t.e(6288).then(t.bind(t,2153)),"@site/versioned_docs/version-3.23/data-sources/cosmic-json.md",2153],"8ae16000":[()=>t.e(4105).then(t.bind(t,3827)),"@site/docs/data-sources/gnomad-small-variants-json.md",3827],"8b37abaa":[()=>t.e(6746).then(t.bind(t,153)),"@site/versioned_docs/version-3.23/core-functionality/variant-ids.md",153],"8b7fbb05":[()=>t.e(5713).then(t.bind(t,5186)),"@site/docs/introduction/licensedContent.mdx",5186],"8c5c59ee":[()=>t.e(9962).then(t.bind(t,8157)),"@site/docs/frequently-asked-questions/Annotator-vs-data-update.md",8157],"8ca6b7fc":[()=>t.e(9070).then(t.bind(t,7391)),"@site/versioned_docs/version-3.22/data-sources/revel.mdx",7391],"8ca849dc":[()=>t.e(6787).then(t.bind(t,3552)),"@site/versioned_docs/version-3.22/data-sources/dann.mdx",3552],"8ee30fb3":[()=>t.e(4117).then(t.bind(t,7847)),"@site/versioned_docs/version-3.22/data-sources/mitomap-small-variants-json.md",7847],"8f87ea60":[()=>t.e(5651).then(t.bind(t,7549)),"@site/versioned_docs/version-3.23/utilities/jasix.mdx",7549],"905a57e0":[()=>t.e(4453).then(t.bind(t,1110)),"@site/versioned_docs/version-3.22/data-sources/cosmic-cancer-gene-census.md",1110],"915fca76":[()=>t.e(9639).then(t.bind(t,3556)),"@site/docs/data-sources/primate-ai.mdx",3556],"916efb7e":[()=>t.e(9368).then(t.t.bind(t,8951,19)),"~docs/default/version-3-22-metadata-prop-77e.json",8951],"935f2afb":[()=>t.e(53).then(t.t.bind(t,1109,19)),"~docs/default/version-current-metadata-prop-751.json",1109],"936eec31":[()=>t.e(9608).then(t.bind(t,4835)),"@site/versioned_docs/version-3.22/data-sources/clinvar-json.md",4835],"93bfed11":[()=>t.e(3832).then(t.bind(t,2349)),"@site/versioned_docs/version-3.23/data-sources/revel-json.md",2349],"946595d9":[()=>t.e(6474).then(t.bind(t,3018)),"@site/versioned_docs/version-3.23/introduction/parsing-json.md",3018],"9620026c":[()=>t.e(6602).then(t.bind(t,1888)),"@site/docs/data-sources/1000Genomes-snv-json.md",1888],"9737b5e1":[()=>t.e(5970).then(t.bind(t,1484)),"@site/versioned_docs/version-3.22/data-sources/clingen-gene-validity-json.md",1484],"973fe7cb":[()=>t.e(4008).then(t.bind(t,1332)),"@site/versioned_docs/version-3.22/data-sources/splice-ai.mdx",1332],"97754c84":[()=>t.e(4857).then(t.bind(t,2162)),"@site/versioned_docs/version-3.22/data-sources/fusioncatcher-json.md",2162],"988d0ae8":[()=>t.e(472).then(t.bind(t,5771)),"@site/docs/data-sources/dann.mdx",5771],"98b05c7a":[()=>t.e(1914).then(t.bind(t,4455)),"@site/versioned_docs/version-3.22/data-sources/gnomad-structural-variants-json.md",4455],"98bbf06c":[()=>t.e(4858).then(t.bind(t,1106)),"@site/docs/data-sources/gnomad.mdx",1106],"9a14a8bf":[()=>t.e(4926).then(t.bind(t,3781)),"@site/docs/data-sources/gnomad4.0-lof-json.md",3781],"9a946f68":[()=>t.e(5198).then(t.bind(t,6959)),"@site/docs/data-sources/cancer-hotspots.mdx",6959],"9e4087bc":[()=>t.e(3608).then(t.bind(t,3012)),"@theme/BlogArchivePage",3012],a049baa7:[()=>t.e(6983).then(t.bind(t,9181)),"@site/versioned_docs/version-3.23/data-sources/gnomad.mdx",9181],a26ba82d:[()=>t.e(7706).then(t.bind(t,1702)),"@site/docs/data-sources/phylop.mdx",1702],a2ab8500:[()=>t.e(2865).then(t.bind(t,4133)),"@site/docs/data-sources/phylop-json.md",4133],a41b3c06:[()=>t.e(7435).then(t.bind(t,9439)),"@site/versioned_docs/version-3.23/introduction/dependencies.md",9439],a5e136a1:[()=>t.e(8111).then(t.bind(t,3814)),"@site/docs/core-functionality/variant-ids.md",3814],a665ff54:[()=>t.e(3911).then(t.bind(t,3342)),"@site/versioned_docs/version-3.23/data-sources/gnomad-lof-json.md",3342],a792da87:[()=>t.e(4863).then(t.bind(t,2628)),"@site/versioned_docs/version-3.23/data-sources/clinvar.mdx",2628],a8504dcf:[()=>t.e(1633).then(t.bind(t,9729)),"@site/docs/data-sources/amino-acid-conservation-json.md",9729],a8da062f:[()=>t.e(2630).then(t.bind(t,9156)),"@site/docs/data-sources/dbsnp-json.md",9156],a9ecceb6:[()=>t.e(4203).then(t.bind(t,7234)),"@site/docs/data-sources/1000Genomes.mdx",7234],aa90c840:[()=>t.e(6299).then(t.bind(t,2531)),"@site/versioned_docs/version-3.22/data-sources/mitomap.mdx",2531],abda0f14:[()=>t.e(829).then(t.bind(t,7356)),"@site/docs/data-sources/clingen-dosage-json.md",7356],add85258:[()=>t.e(8416).then(t.bind(t,9810)),"@site/versioned_docs/version-3.22/data-sources/mitomap-structural-variants-json.md",9810],aeee51c8:[()=>t.e(639).then(t.bind(t,8247)),"@site/versioned_docs/version-3.22/core-functionality/transcript-consequence-impacts.md",8247],b2e466e8:[()=>t.e(8577).then(t.bind(t,120)),"@site/docs/file-formats/illumina-annotator-json-file-format.mdx",120],b38f1ddb:[()=>t.e(2510).then(t.bind(t,1337)),"@site/versioned_docs/version-3.22/data-sources/topmed-json.md",1337],b406875a:[()=>t.e(9308).then(t.bind(t,3081)),"@site/versioned_docs/version-3.22/data-sources/revel-json.md",3081],b4210c11:[()=>t.e(7870).then(t.bind(t,4674)),"@site/docs/data-sources/clingen-json.md",4674],b51ccab7:[()=>t.e(611).then(t.bind(t,1562)),"@site/docs/data-sources/revel.mdx",1562],b6dcd8b7:[()=>t.e(6458).then(t.bind(t,525)),"@site/docs/data-sources/cosmic-json.md",525],b7b6e5d7:[()=>t.e(1355).then(t.bind(t,5681)),"@site/versioned_docs/version-3.22/data-sources/fusioncatcher.mdx",5681],b97c1ec5:[()=>t.e(1494).then(t.bind(t,592)),"@site/versioned_docs/version-3.22/utilities/sautils.mdx",592],ba2982bf:[()=>t.e(2038).then(t.bind(t,8295)),"@site/docs/data-sources/splice-ai.mdx",8295],bcdb388a:[()=>t.e(7816).then(t.bind(t,8808)),"@site/versioned_docs/version-3.22/data-sources/cancer-hotspots.mdx",8808],bebe1a3a:[()=>t.e(6729).then(t.bind(t,3966)),"@site/versioned_docs/version-3.22/data-sources/gnomad.mdx",3966],bf53e43c:[()=>t.e(9973).then(t.bind(t,8563)),"@site/versioned_docs/version-3.23/core-functionality/gene-fusions.md",8563],c3c2a1f1:[()=>t.e(8792).then(t.bind(t,9224)),"@site/versioned_docs/version-3.22/data-sources/1000Genomes.mdx",9224],c5a697ac:[()=>t.e(8418).then(t.bind(t,2351)),"@site/versioned_docs/version-3.23/data-sources/dbsnp.mdx",2351],c95142d3:[()=>t.e(892).then(t.bind(t,7482)),"@site/versioned_docs/version-3.23/data-sources/omim-json.md",7482],ca4cc287:[()=>t.e(709).then(t.bind(t,6886)),"@site/versioned_docs/version-3.23/data-sources/mitomap-small-variants-json.md",6886],cbe769db:[()=>t.e(3655).then(t.bind(t,2589)),"@site/versioned_docs/version-3.22/introduction/parsing-json.md",2589],cd0802b4:[()=>t.e(1144).then(t.bind(t,3468)),"@site/docs/data-sources/fusioncatcher.mdx",3468],cd35fae7:[()=>t.e(5490).then(t.bind(t,1396)),"@site/docs/data-sources/clinvar.mdx",1396],cd820d6d:[()=>t.e(2837).then(t.bind(t,221)),"@site/docs/data-sources/clinvar-preview-json.md",221],cd8220b1:[()=>t.e(4246).then(t.bind(t,9819)),"@site/docs/data-sources/topmed-json.md",9819],d247ca0b:[()=>t.e(6871).then(t.bind(t,8846)),"@site/docs/data-sources/clinvar-preview.mdx",8846],d4cb531b:[()=>t.e(2032).then(t.bind(t,246)),"@site/versioned_docs/version-3.22/data-sources/topmed.mdx",246],d757867a:[()=>t.e(150).then(t.bind(t,4540)),"@site/versioned_docs/version-3.23/utilities/sautils.mdx",4540],dbc89f8d:[()=>t.e(711).then(t.bind(t,5282)),"@site/versioned_docs/version-3.22/data-sources/omim-json.md",5282],dd91fa1e:[()=>t.e(1283).then(t.bind(t,8748)),"@site/versioned_docs/version-3.23/data-sources/clinvar-json.md",8748],dfa01370:[()=>t.e(4460).then(t.bind(t,1978)),"@site/versioned_docs/version-3.22/data-sources/clingen-dosage-json.md",1978],e1c0dc4a:[()=>t.e(5153).then(t.bind(t,1122)),"@site/versioned_docs/version-3.22/data-sources/dann-json.md",1122],e1e7c361:[()=>t.e(1443).then(t.bind(t,2791)),"@site/docs/introduction/parsing-json.md",2791],e286457f:[()=>t.e(4773).then(t.bind(t,19)),"@site/docs/file-formats/custom-annotations.md",19],e2baf76c:[()=>t.e(9792).then(t.bind(t,6682)),"@site/versioned_docs/version-3.23/file-formats/custom-annotations.md",6682],e39dd739:[()=>t.e(3805).then(t.bind(t,818)),"@site/docs/data-sources/gnomad-structural-variants-json.md",818],e95cadfe:[()=>t.e(5277).then(t.bind(t,1533)),"@site/docs/core-functionality/gene-fusions.md",1533],ea1a2647:[()=>t.e(7792).then(t.bind(t,4338)),"@site/versioned_docs/version-3.22/data-sources/clinvar.mdx",4338],eef24e02:[()=>t.e(4974).then(t.bind(t,6220)),"@site/docs/utilities/jasix.mdx",6220],ef4059aa:[()=>t.e(3790).then(t.bind(t,668)),"@site/docs/introduction/introduction.mdx",668],f048ed9e:[()=>t.e(2696).then(t.bind(t,5675)),"@site/docs/introduction/getting-started.md",5675],f0cfb972:[()=>t.e(1048).then(t.bind(t,2637)),"@site/versioned_docs/version-3.22/data-sources/mito-heteroplasmy.md",2637],f262a5f6:[()=>t.e(6969).then(t.bind(t,1969)),"@site/docs/data-sources/gerp.mdx",1969],f42ca355:[()=>t.e(7342).then(t.bind(t,1706)),"@site/versioned_docs/version-3.22/data-sources/cosmic-gene-fusion-json.md",1706],f7e8c160:[()=>t.e(700).then(t.bind(t,1043)),"@site/docs/introduction/dependencies.md",1043],f98a4229:[()=>t.e(8633).then(t.bind(t,8036)),"@site/docs/data-sources/gme-json.md",8036],fb0d881d:[()=>t.e(1900).then(t.bind(t,3304)),"@site/versioned_docs/version-3.22/data-sources/1000Genomes-sv-json.md",3304],fcc450d8:[()=>t.e(4491).then(t.bind(t,5327)),"@site/versioned_docs/version-3.23/data-sources/gnomad-small-variants-json.md",5327],fdf7d659:[()=>t.e(8655).then(t.bind(t,9060)),"@site/docs/file-formats/illumina-annotator-vcf-file-format.mdx",9060]};const d=function(e){const n={};return function e(t,o){Object.keys(t).forEach((a=>{const r=t[a],i=o?`${o}.${a}`:a;var s;"object"==typeof(s=r)&&s&&Object.keys(s).length>0?e(r,i):n[i]=r}))}(e),n};const m=function(e,n){if("*"===e)return s()({loading:c,loader:()=>t.e(4608).then(t.bind(t,4608))});const a=l[`${e}-${n}`],r=[],i=[],m={},p=d(a);return Object.keys(p).forEach((e=>{const n=u[p[e]];n&&(m[e]=n[0],r.push(n[1]),i.push(n[2]))})),s().Map({loading:c,loader:m,modules:r,webpack:()=>i,render:(e,n)=>{const t=JSON.parse(JSON.stringify(a));Object.keys(e).forEach((n=>{let o=t;const a=n.split(".");for(let e=0;e"default"!==e));r&&r.length&&r.forEach((t=>{o[a[a.length-1]][t]=e[n][t]}))}));const r=t.component;return delete t.component,o.createElement(r,{...t,...n})}})},p=[{path:"/IlluminaConnectedAnnotationsDocumentation/blog/archive",component:m("/IlluminaConnectedAnnotationsDocumentation/blog/archive","192"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/search",component:m("/IlluminaConnectedAnnotationsDocumentation/search","158"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/versions",component:m("/IlluminaConnectedAnnotationsDocumentation/versions","4b9"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22","267"),routes:[{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/","f61"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcripts",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcripts","f08"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusions",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusions","ef1"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impacts",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impacts","93e"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-ids",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-ids","d01"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes","a9b"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-json","66d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-json","86e"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation","254"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-json","3dd"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspots",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspots","156"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen","8c9"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-json","3de"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-json","0d4"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-json","370"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar","992"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-json","969"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic","d16"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-census",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-census","da3"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-json","2be"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-json","e5f"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann","e25"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-json","ae5"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp","487"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-json","4ef"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher","769"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-json","9c2"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher","194"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-json","282"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp","bb2"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-json","7ee"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme","504"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-json","2ef"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad","906"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-json","492"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-json","0cd"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_description",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_description","94d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-json","c06"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmy",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmy","5d6"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap","fb6"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-json","901"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-json","01d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim","48d"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-json","800"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop","83d"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-json","e4c"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai","630"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-json","7ef"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel","ad5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-json","6d8"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai","ac6"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-json","3a8"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed","076"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-json","e0d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotations",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotations","4bb"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-format",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-format","f91"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependencies",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependencies","4b5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-started",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-started","241"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-json","fcb"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasix",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasix","63f"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautils",component:m("/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautils","7d9"),exact:!0,sidebar:"docs"}]},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23","256"),routes:[{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/","879"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcripts",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcripts","6d3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusions",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusions","c81"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impacts",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impacts","5bc"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-ids",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-ids","aa7"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes","b8e"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-json","270"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-json","68e"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation","b73"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-json","af0"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspots",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspots","24e"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen","891"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-json","958"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-json","adb"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-json","e6c"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar","023"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-json","098"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic","3c5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-census",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-census","3b7"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-json","b72"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-json","122"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann","f51"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-json","c27"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp","b1e"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-json","356"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher","b67"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-json","0fe"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher","79a"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-json","bba"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp","883"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-json","f03"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme","293"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-json","d55"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad","6fb"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-json","270"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-json","5f8"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_description",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_description","14b"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-json","587"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmy",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmy","ac9"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap","525"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-json","b38"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-json","035"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim","4b9"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-json","7dd"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop","b6b"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-json","62c"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-json","2a9"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai","a6a"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-json","cfa"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel","14c"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-json","705"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai","f95"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-json","c2f"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed","f47"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-json","d42"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotations",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotations","a03"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-format",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-format","115"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependencies",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependencies","cd8"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-started",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-started","01c"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContent",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContent","60c"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-json",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-json","8d6"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasix",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasix","bc9"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautils",component:m("/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautils","062"),exact:!0,sidebar:"docs"}]},{path:"/IlluminaConnectedAnnotationsDocumentation/",component:m("/IlluminaConnectedAnnotationsDocumentation/","2e7"),routes:[{path:"/IlluminaConnectedAnnotationsDocumentation/",component:m("/IlluminaConnectedAnnotationsDocumentation/","0a5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts",component:m("/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts","0f3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions",component:m("/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions","9a5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving",component:m("/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving","494"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts",component:m("/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts","572"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids",component:m("/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids","8d0"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes","6a1"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-json","c39"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-json","4fd"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation","d35"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-json","8bf"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots","b95"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen","fa4"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-json","58e"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-json","547"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-json","286"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar","bed"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-json","d9e"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview","611"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-json","b47"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic","0af"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-census",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-census","bb8"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-json","094"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-json","cbc"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/dann","a22"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-json","5bd"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp","bd6"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-json","f1d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher","569"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-json","f45"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher","4e3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-json","774"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp","03f"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-json","c31"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gme","95e"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-json","45b"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad","6b3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-json","249"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-json","3d5"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_description",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_description","1e7"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-json","57f"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-json","d8a"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-json","f88"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-json","29b"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy","068"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap","540"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-json","3d1"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-json","00d"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/omim","7c3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-json","83f"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop","3ef"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-json","98b"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-json","bc2"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai","fc4"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-json","3e2"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/revel","172"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-json","997"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai","7de"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-json","1d4"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed","ea7"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-json",component:m("/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-json","014"),exact:!0},{path:"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations",component:m("/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations","4b4"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format",component:m("/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format","76b"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format",component:m("/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format","40f"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update",component:m("/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update","743"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies",component:m("/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies","1c5"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started",component:m("/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started","ab0"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent",component:m("/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent","af3"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json",component:m("/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json","289"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix",component:m("/IlluminaConnectedAnnotationsDocumentation/utilities/jasix","b46"),exact:!0,sidebar:"docs"},{path:"/IlluminaConnectedAnnotationsDocumentation/utilities/sautils",component:m("/IlluminaConnectedAnnotationsDocumentation/utilities/sautils","b75"),exact:!0,sidebar:"docs"}]},{path:"*",component:m("*")}];var f=t(412),h=t(6291),g=t(9913),v=t(7041),b=t(6550),y=t(4865),w=t.n(y);const C=[t(2497),t(2448),t(6743),t(2295)];function A(e){for(var n=arguments.length,t=new Array(n>1?n-1:0),o=1;o{var o,a;const r=null!==(a=null===(o=null==n?void 0:n.default)||void 0===o?void 0:o[e])&&void 0!==a?a:n[e];r&&r(...t)}))}const k={onRouteUpdate(){for(var e=arguments.length,n=new Array(e),t=0;t{const{component:n}=e.route;if(n&&n.preload)return n.preload()})))}const I={};const S=function(e){if(I[e.pathname])return{...e,pathname:I[e.pathname]};let n=e.pathname||"/";return n=n.trim().replace(/\/index\.html$/,""),""===n&&(n="/"),I[e.pathname]=n,{...e,pathname:n}};w().configure({showSpinner:!1});class x extends o.Component{constructor(e){super(e),this.previousLocation=null,this.progressBarTimeout=null,this.state={nextRouteHasLoaded:!0}}shouldComponentUpdate(e,n){const t=e.location!==this.props.location,{routes:o,delay:a}=this.props;if(t){const n=S(e.location);return this.startProgressBar(a),this.previousLocation=S(this.props.location),this.setState({nextRouteHasLoaded:!1}),E(o,n.pathname).then((()=>{k.onRouteUpdate({previousLocation:this.previousLocation,location:n}),this.previousLocation=null,this.setState({nextRouteHasLoaded:!0},this.stopProgressBar);const{hash:e}=n;if(e){const n=decodeURIComponent(e.substring(1)),t=document.getElementById(n);t&&t.scrollIntoView()}else window.scrollTo(0,0)})).catch((e=>console.warn(e))),!1}return!!n.nextRouteHasLoaded}clearProgressBarTimeout(){this.progressBarTimeout&&(clearTimeout(this.progressBarTimeout),this.progressBarTimeout=null)}startProgressBar(e){this.clearProgressBarTimeout(),this.progressBarTimeout=setTimeout((()=>{k.onRouteUpdateDelayed({location:S(this.props.location)}),w().start()}),e)}stopProgressBar(){this.clearProgressBarTimeout(),w().done()}render(){const{children:e,location:n}=this.props;return o.createElement(b.AW,{location:S(n),render:()=>e})}}const _=(0,b.EN)(x);var j=t(2859),T=t(2263);const O="docusaurus-base-url-issue-banner-container",P="docusaurus-base-url-issue-banner",L="docusaurus-base-url-issue-banner-suggestion-container",R="__DOCUSAURUS_INSERT_BASEURL_BANNER";function N(e){return`\nwindow['${R}'] = true;\n\ndocument.addEventListener('DOMContentLoaded', maybeInsertBanner);\n\nfunction maybeInsertBanner() {\n var shouldInsert = window['${R}'];\n shouldInsert && insertBanner();\n}\n\nfunction insertBanner() {\n var bannerContainer = document.getElementById('${O}');\n if (!bannerContainer) {\n return;\n }\n var bannerHtml = ${JSON.stringify(function(e){return`\n
\n

Your Docusaurus site did not load properly.

\n

A very common reason is a wrong site baseUrl configuration.

\n

Current configured baseUrl = ${e} ${"/"===e?" (default value)":""}

\n

We suggest trying baseUrl =

\n
\n`}(e)).replace(/{window[R]=!1}),[]),o.createElement(o.Fragment,null,!f.Z.canUseDOM&&o.createElement(j.Z,null,o.createElement("script",null,N(e))),o.createElement("div",{id:O}))}function F(){const{siteConfig:{baseUrl:e,baseUrlIssueBanner:n}}=(0,T.Z)(),{pathname:t}=(0,b.TH)();return n&&t===e?o.createElement(M,null):null}const B=function(e){let{children:n}=e;return n};var U=t(780),z=t(4953);const $=function(){return o.createElement(U.Z,{fallback:z.Z},o.createElement(v.M,null,o.createElement(g.t,null,o.createElement(B,null,o.createElement(F,null),o.createElement(_,{routes:p,delay:1e3},(0,h.Z)(p))))))};const G=function(e){if("undefined"==typeof document)return!1;const n=document.createElement("link");try{if(n.relList&&"function"==typeof n.relList.supports)return n.relList.supports(e)}catch(t){return!1}return!1}("prefetch")?function(e){return new Promise(((n,t)=>{if("undefined"==typeof document)return void t();const o=document.createElement("link");o.setAttribute("rel","prefetch"),o.setAttribute("href",e),o.onload=n,o.onerror=t;(document.getElementsByTagName("head")[0]||document.getElementsByName("script")[0].parentNode).appendChild(o)}))}:function(e){return new Promise(((n,t)=>{const o=new XMLHttpRequest;o.open("GET",e,!0),o.withCredentials=!0,o.onload=()=>{200===o.status?n():t()},o.send(null)}))},q={};const Z=function(e){return new Promise((n=>{q[e]?n():G(e).then((()=>{n(),q[e]=!0})).catch((()=>{}))}))},H={},V={},W=()=>{var e,n;return(null===(e=navigator.connection)||void 0===e?void 0:e.effectiveType.includes("2g"))&&(null===(n=navigator.connection)||void 0===n?void 0:n.saveData)},K={prefetch:e=>{if(!(e=>!W()&&!V[e]&&!H[e])(e))return!1;H[e]=!0;return(0,D.f)(p,e).flatMap((e=>{return n=e.route.path,Object.entries(l).filter((e=>{let[t]=e;return t.replace(/(-[^-]+)$/,"")===n})).flatMap((e=>{let[,n]=e;return Object.values(d(n))}));var n})).forEach((e=>{const n=t.gca(e);n&&!/undefined/.test(n)&&Z(n)})),!0},preload:e=>!!(e=>!W()&&!V[e])(e)&&(V[e]=!0,E(p,e),!0)};if(f.Z.canUseDOM){window.docusaurus=K;const e=a.hydrate;E(p,window.location.pathname).then((()=>{e(o.createElement(r.VK,null,o.createElement($,null)),document.getElementById("__docusaurus"))}))}},780:(e,n,t)=>{"use strict";t.d(n,{Z:()=>s});var o=t(7294),a=t(412),r=t(4953);class i extends o.Component{constructor(e){super(e),this.state={error:null}}componentDidCatch(e){a.Z.canUseDOM&&this.setState({error:e})}render(){var e;const{children:n}=this.props,{error:t}=this.state;if(t){return(null!==(e=this.props.fallback)&&void 0!==e?e:r.Z)({error:t,tryAgain:()=>this.setState({error:null})})}return n}}const s=i},412:(e,n,t)=>{"use strict";t.d(n,{Z:()=>a});const o=!("undefined"==typeof window||!window.document||!window.document.createElement),a={canUseDOM:o,canUseEventListeners:o&&!(!window.addEventListener&&!window.attachEvent),canUseIntersectionObserver:o&&"IntersectionObserver"in window,canUseViewport:o&&!!window.screen}},2859:(e,n,t)=>{"use strict";t.d(n,{Z:()=>fe});var o,a,r,i,s=t(7294),c=t(5697),l=t.n(c),u=t(3524),d=t.n(u),m=t(9590),p=t.n(m),f=t(7418),h=t.n(f),g="bodyAttributes",v="htmlAttributes",b="titleAttributes",y={BASE:"base",BODY:"body",HEAD:"head",HTML:"html",LINK:"link",META:"meta",NOSCRIPT:"noscript",SCRIPT:"script",STYLE:"style",TITLE:"title"},w=(Object.keys(y).map((function(e){return y[e]})),"charset"),C="cssText",A="href",k="http-equiv",D="innerHTML",E="itemprop",I="name",S="property",x="rel",_="src",j="target",T={accesskey:"accessKey",charset:"charSet",class:"className",contenteditable:"contentEditable",contextmenu:"contextMenu","http-equiv":"httpEquiv",itemprop:"itemProp",tabindex:"tabIndex"},O="defaultTitle",P="defer",L="encodeSpecialCharacters",R="onChangeClientState",N="titleTemplate",M=Object.keys(T).reduce((function(e,n){return e[T[n]]=n,e}),{}),F=[y.NOSCRIPT,y.SCRIPT,y.STYLE],B="data-react-helmet",U="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},z=function(){function e(e,n){for(var t=0;t=0||Object.prototype.hasOwnProperty.call(e,o)&&(t[o]=e[o]);return t},q=function(e){return!1===(!(arguments.length>1&&void 0!==arguments[1])||arguments[1])?String(e):String(e).replace(/&/g,"&").replace(//g,">").replace(/"/g,""").replace(/'/g,"'")},Z=function(e){var n=Y(e,y.TITLE),t=Y(e,N);if(t&&n)return t.replace(/%s/g,(function(){return Array.isArray(n)?n.join(""):n}));var o=Y(e,O);return n||o||void 0},H=function(e){return Y(e,R)||function(){}},V=function(e,n){return n.filter((function(n){return void 0!==n[e]})).map((function(n){return n[e]})).reduce((function(e,n){return $({},e,n)}),{})},W=function(e,n){return n.filter((function(e){return void 0!==e[y.BASE]})).map((function(e){return e[y.BASE]})).reverse().reduce((function(n,t){if(!n.length)for(var o=Object.keys(t),a=0;a=0;t--){var o=e[t];if(o.hasOwnProperty(n))return o[n]}return null},Q=(o=Date.now(),function(e){var n=Date.now();n-o>16?(o=n,e(n)):setTimeout((function(){Q(e)}),0)}),X=function(e){return clearTimeout(e)},J="undefined"!=typeof window?window.requestAnimationFrame&&window.requestAnimationFrame.bind(window)||window.webkitRequestAnimationFrame||window.mozRequestAnimationFrame||Q:t.g.requestAnimationFrame||Q,ee="undefined"!=typeof window?window.cancelAnimationFrame||window.webkitCancelAnimationFrame||window.mozCancelAnimationFrame||X:t.g.cancelAnimationFrame||X,ne=function(e){return console&&"function"==typeof console.warn&&console.warn(e)},te=null,oe=function(e,n){var t=e.baseTag,o=e.bodyAttributes,a=e.htmlAttributes,r=e.linkTags,i=e.metaTags,s=e.noscriptTags,c=e.onChangeClientState,l=e.scriptTags,u=e.styleTags,d=e.title,m=e.titleAttributes;ie(y.BODY,o),ie(y.HTML,a),re(d,m);var p={baseTag:se(y.BASE,t),linkTags:se(y.LINK,r),metaTags:se(y.META,i),noscriptTags:se(y.NOSCRIPT,s),scriptTags:se(y.SCRIPT,l),styleTags:se(y.STYLE,u)},f={},h={};Object.keys(p).forEach((function(e){var n=p[e],t=n.newTags,o=n.oldTags;t.length&&(f[e]=t),o.length&&(h[e]=p[e].oldTags)})),n&&n(),c(e,f,h)},ae=function(e){return Array.isArray(e)?e.join(""):e},re=function(e,n){void 0!==e&&document.title!==e&&(document.title=ae(e)),ie(y.TITLE,n)},ie=function(e,n){var t=document.getElementsByTagName(e)[0];if(t){for(var o=t.getAttribute(B),a=o?o.split(","):[],r=[].concat(a),i=Object.keys(n),s=0;s=0;d--)t.removeAttribute(r[d]);a.length===r.length?t.removeAttribute(B):t.getAttribute(B)!==i.join(",")&&t.setAttribute(B,i.join(","))}},se=function(e,n){var t=document.head||document.querySelector(y.HEAD),o=t.querySelectorAll(e+"["+B+"]"),a=Array.prototype.slice.call(o),r=[],i=void 0;return n&&n.length&&n.forEach((function(n){var t=document.createElement(e);for(var o in n)if(n.hasOwnProperty(o))if(o===D)t.innerHTML=n.innerHTML;else if(o===C)t.styleSheet?t.styleSheet.cssText=n.cssText:t.appendChild(document.createTextNode(n.cssText));else{var s=void 0===n[o]?"":n[o];t.setAttribute(o,s)}t.setAttribute(B,"true"),a.some((function(e,n){return i=n,t.isEqualNode(e)}))?a.splice(i,1):r.push(t)})),a.forEach((function(e){return e.parentNode.removeChild(e)})),r.forEach((function(e){return t.appendChild(e)})),{oldTags:a,newTags:r}},ce=function(e){return Object.keys(e).reduce((function(n,t){var o=void 0!==e[t]?t+'="'+e[t]+'"':""+t;return n?n+" "+o:o}),"")},le=function(e){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:{};return Object.keys(e).reduce((function(n,t){return n[T[t]||t]=e[t],n}),n)},ue=function(e,n,t){switch(e){case y.TITLE:return{toComponent:function(){return e=n.title,t=n.titleAttributes,(o={key:e})[B]=!0,a=le(t,o),[s.createElement(y.TITLE,a,e)];var e,t,o,a},toString:function(){return function(e,n,t,o){var a=ce(t),r=ae(n);return a?"<"+e+" "+B+'="true" '+a+">"+q(r,o)+"":"<"+e+" "+B+'="true">'+q(r,o)+""}(e,n.title,n.titleAttributes,t)}};case g:case v:return{toComponent:function(){return le(n)},toString:function(){return ce(n)}};default:return{toComponent:function(){return function(e,n){return n.map((function(n,t){var o,a=((o={key:t})[B]=!0,o);return Object.keys(n).forEach((function(e){var t=T[e]||e;if(t===D||t===C){var o=n.innerHTML||n.cssText;a.dangerouslySetInnerHTML={__html:o}}else a[t]=n[e]})),s.createElement(e,a)}))}(e,n)},toString:function(){return function(e,n,t){return n.reduce((function(n,o){var a=Object.keys(o).filter((function(e){return!(e===D||e===C)})).reduce((function(e,n){var a=void 0===o[n]?n:n+'="'+q(o[n],t)+'"';return e?e+" "+a:a}),""),r=o.innerHTML||o.cssText||"",i=-1===F.indexOf(e);return n+"<"+e+" "+B+'="true" '+a+(i?"/>":">"+r+"")}),"")}(e,n,t)}}}},de=function(e){var n=e.baseTag,t=e.bodyAttributes,o=e.encode,a=e.htmlAttributes,r=e.linkTags,i=e.metaTags,s=e.noscriptTags,c=e.scriptTags,l=e.styleTags,u=e.title,d=void 0===u?"":u,m=e.titleAttributes;return{base:ue(y.BASE,n,o),bodyAttributes:ue(g,t,o),htmlAttributes:ue(v,a,o),link:ue(y.LINK,r,o),meta:ue(y.META,i,o),noscript:ue(y.NOSCRIPT,s,o),script:ue(y.SCRIPT,c,o),style:ue(y.STYLE,l,o),title:ue(y.TITLE,{title:d,titleAttributes:m},o)}},me=d()((function(e){return{baseTag:W([A,j],e),bodyAttributes:V(g,e),defer:Y(e,P),encode:Y(e,L),htmlAttributes:V(v,e),linkTags:K(y.LINK,[x,A],e),metaTags:K(y.META,[I,w,k,S,E],e),noscriptTags:K(y.NOSCRIPT,[D],e),onChangeClientState:H(e),scriptTags:K(y.SCRIPT,[_,D],e),styleTags:K(y.STYLE,[C],e),title:Z(e),titleAttributes:V(b,e)}}),(function(e){te&&ee(te),e.defer?te=J((function(){oe(e,(function(){te=null}))})):(oe(e),te=null)}),de)((function(){return null})),pe=(a=me,i=r=function(e){function n(){return function(e,n){if(!(e instanceof n))throw new TypeError("Cannot call a class as a function")}(this,n),function(e,n){if(!e)throw new ReferenceError("this hasn't been initialised - super() hasn't been called");return!n||"object"!=typeof n&&"function"!=typeof n?e:n}(this,e.apply(this,arguments))}return function(e,n){if("function"!=typeof n&&null!==n)throw new TypeError("Super expression must either be null or a function, not "+typeof n);e.prototype=Object.create(n&&n.prototype,{constructor:{value:e,enumerable:!1,writable:!0,configurable:!0}}),n&&(Object.setPrototypeOf?Object.setPrototypeOf(e,n):e.__proto__=n)}(n,e),n.prototype.shouldComponentUpdate=function(e){return!p()(this.props,e)},n.prototype.mapNestedChildrenToProps=function(e,n){if(!n)return null;switch(e.type){case y.SCRIPT:case y.NOSCRIPT:return{innerHTML:n};case y.STYLE:return{cssText:n}}throw new Error("<"+e.type+" /> elements are self-closing and can not contain children. Refer to our API for more information.")},n.prototype.flattenArrayTypeChildren=function(e){var n,t=e.child,o=e.arrayTypeChildren,a=e.newChildProps,r=e.nestedChildren;return $({},o,((n={})[t.type]=[].concat(o[t.type]||[],[$({},a,this.mapNestedChildrenToProps(t,r))]),n))},n.prototype.mapObjectTypeChildren=function(e){var n,t,o=e.child,a=e.newProps,r=e.newChildProps,i=e.nestedChildren;switch(o.type){case y.TITLE:return $({},a,((n={})[o.type]=i,n.titleAttributes=$({},r),n));case y.BODY:return $({},a,{bodyAttributes:$({},r)});case y.HTML:return $({},a,{htmlAttributes:$({},r)})}return $({},a,((t={})[o.type]=$({},r),t))},n.prototype.mapArrayTypeChildrenToProps=function(e,n){var t=$({},n);return Object.keys(e).forEach((function(n){var o;t=$({},t,((o={})[n]=e[n],o))})),t},n.prototype.warnOnInvalidChildren=function(e,n){return!0},n.prototype.mapChildrenToProps=function(e,n){var t=this,o={};return s.Children.forEach(e,(function(e){if(e&&e.props){var a=e.props,r=a.children,i=function(e){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:{};return Object.keys(e).reduce((function(n,t){return n[M[t]||t]=e[t],n}),n)}(G(a,["children"]));switch(t.warnOnInvalidChildren(e,r),e.type){case y.LINK:case y.META:case y.NOSCRIPT:case y.SCRIPT:case y.STYLE:o=t.flattenArrayTypeChildren({child:e,arrayTypeChildren:o,newChildProps:i,nestedChildren:r});break;default:n=t.mapObjectTypeChildren({child:e,newProps:n,newChildProps:i,nestedChildren:r})}}})),n=this.mapArrayTypeChildrenToProps(o,n)},n.prototype.render=function(){var e=this.props,n=e.children,t=G(e,["children"]),o=$({},t);return n&&(o=this.mapChildrenToProps(n,o)),s.createElement(a,o)},z(n,null,[{key:"canUseDOM",set:function(e){a.canUseDOM=e}}]),n}(s.Component),r.propTypes={base:l().object,bodyAttributes:l().object,children:l().oneOfType([l().arrayOf(l().node),l().node]),defaultTitle:l().string,defer:l().bool,encodeSpecialCharacters:l().bool,htmlAttributes:l().object,link:l().arrayOf(l().object),meta:l().arrayOf(l().object),noscript:l().arrayOf(l().object),onChangeClientState:l().func,script:l().arrayOf(l().object),style:l().arrayOf(l().object),title:l().string,titleAttributes:l().object,titleTemplate:l().string},r.defaultProps={defer:!0,encodeSpecialCharacters:!0},r.peek=a.peek,r.rewind=function(){var e=a.rewind();return e||(e=de({baseTag:[],bodyAttributes:{},encodeSpecialCharacters:!0,htmlAttributes:{},linkTags:[],metaTags:[],noscriptTags:[],scriptTags:[],styleTags:[],title:"",titleAttributes:{}})),e},i);pe.renderStatic=pe.rewind;const fe=function(e){return s.createElement(pe,{...e})}},9960:(e,n,t)=>{"use strict";t.d(n,{Z:()=>d});var o=t(7294),a=t(3727),r=t(2263),i=t(3919),s=t(412);const c=(0,o.createContext)({collectLink:()=>{}});var l=t(4996),u=t(8780);const d=function(e){let{isNavLink:n,to:t,href:d,activeClassName:m,isActive:p,"data-noBrokenLinkCheck":f,autoAddBaseUrl:h=!0,...g}=e;var v;const{siteConfig:{trailingSlash:b,baseUrl:y}}=(0,r.Z)(),{withBaseUrl:w}=(0,l.C)(),C=(0,o.useContext)(c),A=t||d,k=(0,i.Z)(A),D=null==A?void 0:A.replace("pathname://","");let E=void 0!==D?(I=D,h&&(e=>e.startsWith("/"))(I)?w(I):I):void 0;var I;E&&k&&(E=(0,u.applyTrailingSlash)(E,{trailingSlash:b,baseUrl:y}));const S=(0,o.useRef)(!1),x=n?a.OL:a.rU,_=s.Z.canUseIntersectionObserver,j=(0,o.useRef)();(0,o.useEffect)((()=>(!_&&k&&null!=E&&window.docusaurus.prefetch(E),()=>{_&&j.current&&j.current.disconnect()})),[j,E,_,k]);const T=null!==(v=null==E?void 0:E.startsWith("#"))&&void 0!==v&&v,O=!E||!k||T;return E&&k&&!T&&!f&&C.collectLink(E),O?o.createElement("a",{href:E,...A&&!k&&{target:"_blank",rel:"noopener noreferrer"},...g}):o.createElement(x,{...g,onMouseEnter:()=>{S.current||null==E||(window.docusaurus.preload(E),S.current=!0)},innerRef:e=>{var n,t;_&&e&&k&&(n=e,t=()=>{null!=E&&window.docusaurus.prefetch(E)},j.current=new window.IntersectionObserver((e=>{e.forEach((e=>{n===e.target&&(e.isIntersecting||e.intersectionRatio>0)&&(j.current.unobserve(n),j.current.disconnect(),t())}))})),j.current.observe(n))},to:E||"",...n&&{isActive:p,activeClassName:m}})}},5999:(e,n,t)=>{"use strict";t.d(n,{Z:()=>u,I:()=>l});var o=t(7294);const a=/{\w+}/g,r="{}";function i(e,n){const t=[],i=e.replace(a,(e=>{const a=e.substring(1,e.length-1),i=null==n?void 0:n[a];if(void 0!==i){const e=o.isValidElement(i)?i:String(i);return t.push(e),r}return e}));return 0===t.length?e:t.every((e=>"string"==typeof e))?i.split(r).reduce(((e,n,o)=>{var a;return e.concat(n).concat(null!==(a=t[o])&&void 0!==a?a:"")}),""):i.split(r).reduce(((e,n,a)=>[...e,o.createElement(o.Fragment,{key:a},n,t[a])]),[])}var s=t(7529);function c(e){let{id:n,message:t}=e;var o,a;if(void 0===n&&void 0===t)throw new Error("Docusaurus translation declarations must have at least a translation id or a default translation message");return null!==(a=null!==(o=s[null!=n?n:t])&&void 0!==o?o:t)&&void 0!==a?a:n}function l(e,n){let{message:t,id:o}=e;return i(c({message:t,id:o}),n)}function u(e){let{children:n,id:t,values:o}=e;if(n&&"string"!=typeof n)throw console.warn("Illegal children",n),new Error("The Docusaurus component only accept simple string values");return i(c({message:n,id:t}),o)}},9913:(e,n,t)=>{"use strict";t.d(n,{_:()=>a,t:()=>r});var o=t(7294);const a=o.createContext(!1);function r(e){let{children:n}=e;const[t,r]=(0,o.useState)(!1);return(0,o.useEffect)((()=>{r(!0)}),[]),o.createElement(a.Provider,{value:t},n)}},9935:(e,n,t)=>{"use strict";t.d(n,{m:()=>o});const o="default"},7041:(e,n,t)=>{"use strict";t.d(n,{_:()=>u,M:()=>d});var o=t(7294),a=t(9782);const r=JSON.parse('{"docusaurus-plugin-content-docs":{"default":{"path":"/IlluminaConnectedAnnotationsDocumentation/","versions":[{"name":"current","label":"3.24 (unreleased)","isLast":true,"path":"/IlluminaConnectedAnnotationsDocumentation/","mainDocId":"introduction/introduction","docs":[{"id":"core-functionality/canonical-transcripts","path":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcripts","sidebar":"docs"},{"id":"core-functionality/gene-fusions","path":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusions","sidebar":"docs"},{"id":"core-functionality/junction-preserving","path":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preserving","sidebar":"docs"},{"id":"core-functionality/transcript-consequence-impacts","path":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impacts","sidebar":"docs"},{"id":"core-functionality/variant-ids","path":"/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-ids","sidebar":"docs"},{"id":"data-sources/1000Genomes","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes","sidebar":"docs"},{"id":"data-sources/1000Genomes-snv-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-json"},{"id":"data-sources/1000Genomes-sv-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-json"},{"id":"data-sources/amino-acid-conservation","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation","sidebar":"docs"},{"id":"data-sources/amino-acid-conservation-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-json"},{"id":"data-sources/cancer-hotspots","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspots","sidebar":"docs"},{"id":"data-sources/clingen","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen","sidebar":"docs"},{"id":"data-sources/clingen-dosage-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-json"},{"id":"data-sources/clingen-gene-validity-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-json"},{"id":"data-sources/clingen-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-json"},{"id":"data-sources/clinvar","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar","sidebar":"docs"},{"id":"data-sources/clinvar-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-json"},{"id":"data-sources/clinvar-preview","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview","sidebar":"docs"},{"id":"data-sources/clinvar-preview-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-json"},{"id":"data-sources/cosmic","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic","sidebar":"docs"},{"id":"data-sources/cosmic-cancer-gene-census","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-census"},{"id":"data-sources/cosmic-gene-fusion-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-json"},{"id":"data-sources/cosmic-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-json"},{"id":"data-sources/dann","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann","sidebar":"docs"},{"id":"data-sources/dann-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-json"},{"id":"data-sources/dbsnp","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp","sidebar":"docs"},{"id":"data-sources/dbsnp-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-json"},{"id":"data-sources/decipher","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher","sidebar":"docs"},{"id":"data-sources/decipher-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-json"},{"id":"data-sources/fusioncatcher","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher","sidebar":"docs"},{"id":"data-sources/fusioncatcher-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-json"},{"id":"data-sources/gerp","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp","sidebar":"docs"},{"id":"data-sources/gerp-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-json"},{"id":"data-sources/gme","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme","sidebar":"docs"},{"id":"data-sources/gme-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-json"},{"id":"data-sources/gnomad","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad","sidebar":"docs"},{"id":"data-sources/gnomad-lof-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-json"},{"id":"data-sources/gnomad-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-json"},{"id":"data-sources/gnomad-structural-variants-data_description","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_description"},{"id":"data-sources/gnomad-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-json"},{"id":"data-sources/gnomad4.0-lof-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-json"},{"id":"data-sources/gnomad4.0-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-json"},{"id":"data-sources/gnomad40-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-json"},{"id":"data-sources/mito-heteroplasmy","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmy","sidebar":"docs"},{"id":"data-sources/mitomap","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap","sidebar":"docs"},{"id":"data-sources/mitomap-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-json"},{"id":"data-sources/mitomap-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-json"},{"id":"data-sources/omim","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim","sidebar":"docs"},{"id":"data-sources/omim-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-json"},{"id":"data-sources/phylop","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop","sidebar":"docs"},{"id":"data-sources/phylop-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-json"},{"id":"data-sources/phylopprimate-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-json"},{"id":"data-sources/primate-ai","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai","sidebar":"docs"},{"id":"data-sources/primate-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-json"},{"id":"data-sources/revel","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel","sidebar":"docs"},{"id":"data-sources/revel-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-json"},{"id":"data-sources/splice-ai","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai","sidebar":"docs"},{"id":"data-sources/splice-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-json"},{"id":"data-sources/topmed","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed","sidebar":"docs"},{"id":"data-sources/topmed-json","path":"/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-json"},{"id":"file-formats/custom-annotations","path":"/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotations","sidebar":"docs"},{"id":"file-formats/illumina-annotator-json-file-format","path":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-format","sidebar":"docs"},{"id":"file-formats/illumina-annotator-vcf-file-format","path":"/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-format","sidebar":"docs"},{"id":"frequently-asked-questions/Annotator-vs-data-update","path":"/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-update","sidebar":"docs"},{"id":"introduction/dependencies","path":"/IlluminaConnectedAnnotationsDocumentation/introduction/dependencies","sidebar":"docs"},{"id":"introduction/getting-started","path":"/IlluminaConnectedAnnotationsDocumentation/introduction/getting-started","sidebar":"docs"},{"id":"introduction/introduction","path":"/IlluminaConnectedAnnotationsDocumentation/","sidebar":"docs"},{"id":"introduction/licensedContent","path":"/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContent","sidebar":"docs"},{"id":"introduction/parsing-json","path":"/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-json","sidebar":"docs"},{"id":"utilities/jasix","path":"/IlluminaConnectedAnnotationsDocumentation/utilities/jasix","sidebar":"docs"},{"id":"utilities/sautils","path":"/IlluminaConnectedAnnotationsDocumentation/utilities/sautils","sidebar":"docs"}]},{"name":"3.23","label":"3.23","isLast":false,"path":"/IlluminaConnectedAnnotationsDocumentation/3.23","mainDocId":"introduction/introduction","docs":[{"id":"core-functionality/canonical-transcripts","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcripts","sidebar":"docs"},{"id":"core-functionality/gene-fusions","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusions","sidebar":"docs"},{"id":"core-functionality/transcript-consequence-impacts","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impacts","sidebar":"docs"},{"id":"core-functionality/variant-ids","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-ids","sidebar":"docs"},{"id":"data-sources/1000Genomes","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes","sidebar":"docs"},{"id":"data-sources/1000Genomes-snv-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-json"},{"id":"data-sources/1000Genomes-sv-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-json"},{"id":"data-sources/amino-acid-conservation","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation","sidebar":"docs"},{"id":"data-sources/amino-acid-conservation-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-json"},{"id":"data-sources/cancer-hotspots","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspots","sidebar":"docs"},{"id":"data-sources/clingen","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen","sidebar":"docs"},{"id":"data-sources/clingen-dosage-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-json"},{"id":"data-sources/clingen-gene-validity-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-json"},{"id":"data-sources/clingen-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-json"},{"id":"data-sources/clinvar","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar","sidebar":"docs"},{"id":"data-sources/clinvar-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-json"},{"id":"data-sources/cosmic","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic","sidebar":"docs"},{"id":"data-sources/cosmic-cancer-gene-census","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-census"},{"id":"data-sources/cosmic-gene-fusion-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-json"},{"id":"data-sources/cosmic-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-json"},{"id":"data-sources/dann","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann","sidebar":"docs"},{"id":"data-sources/dann-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-json"},{"id":"data-sources/dbsnp","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp","sidebar":"docs"},{"id":"data-sources/dbsnp-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-json"},{"id":"data-sources/decipher","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher","sidebar":"docs"},{"id":"data-sources/decipher-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-json"},{"id":"data-sources/fusioncatcher","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher","sidebar":"docs"},{"id":"data-sources/fusioncatcher-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-json"},{"id":"data-sources/gerp","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp","sidebar":"docs"},{"id":"data-sources/gerp-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-json"},{"id":"data-sources/gme","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme","sidebar":"docs"},{"id":"data-sources/gme-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-json"},{"id":"data-sources/gnomad","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad","sidebar":"docs"},{"id":"data-sources/gnomad-lof-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-json"},{"id":"data-sources/gnomad-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-json"},{"id":"data-sources/gnomad-structural-variants-data_description","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_description"},{"id":"data-sources/gnomad-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-json"},{"id":"data-sources/mito-heteroplasmy","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmy","sidebar":"docs"},{"id":"data-sources/mitomap","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap","sidebar":"docs"},{"id":"data-sources/mitomap-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-json"},{"id":"data-sources/mitomap-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-json"},{"id":"data-sources/omim","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim","sidebar":"docs"},{"id":"data-sources/omim-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-json"},{"id":"data-sources/phylop","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop","sidebar":"docs"},{"id":"data-sources/phylop-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-json"},{"id":"data-sources/phylopprimate-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-json"},{"id":"data-sources/primate-ai","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai","sidebar":"docs"},{"id":"data-sources/primate-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-json"},{"id":"data-sources/revel","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel","sidebar":"docs"},{"id":"data-sources/revel-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-json"},{"id":"data-sources/splice-ai","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai","sidebar":"docs"},{"id":"data-sources/splice-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-json"},{"id":"data-sources/topmed","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed","sidebar":"docs"},{"id":"data-sources/topmed-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-json"},{"id":"file-formats/custom-annotations","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotations","sidebar":"docs"},{"id":"file-formats/illumina-annotator-json-file-format","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-format","sidebar":"docs"},{"id":"introduction/dependencies","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependencies","sidebar":"docs"},{"id":"introduction/getting-started","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-started","sidebar":"docs"},{"id":"introduction/introduction","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/","sidebar":"docs"},{"id":"introduction/licensedContent","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContent","sidebar":"docs"},{"id":"introduction/parsing-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-json","sidebar":"docs"},{"id":"utilities/jasix","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasix","sidebar":"docs"},{"id":"utilities/sautils","path":"/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautils","sidebar":"docs"}]},{"name":"3.22","label":"3.22","isLast":false,"path":"/IlluminaConnectedAnnotationsDocumentation/3.22","mainDocId":"introduction/introduction","docs":[{"id":"core-functionality/canonical-transcripts","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcripts","sidebar":"docs"},{"id":"core-functionality/gene-fusions","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusions","sidebar":"docs"},{"id":"core-functionality/transcript-consequence-impacts","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impacts","sidebar":"docs"},{"id":"core-functionality/variant-ids","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-ids","sidebar":"docs"},{"id":"data-sources/1000Genomes","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes","sidebar":"docs"},{"id":"data-sources/1000Genomes-snv-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-json"},{"id":"data-sources/1000Genomes-sv-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-json"},{"id":"data-sources/amino-acid-conservation","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation","sidebar":"docs"},{"id":"data-sources/amino-acid-conservation-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-json"},{"id":"data-sources/cancer-hotspots","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspots","sidebar":"docs"},{"id":"data-sources/clingen","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen","sidebar":"docs"},{"id":"data-sources/clingen-dosage-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-json"},{"id":"data-sources/clingen-gene-validity-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-json"},{"id":"data-sources/clingen-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-json"},{"id":"data-sources/clinvar","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar","sidebar":"docs"},{"id":"data-sources/clinvar-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-json"},{"id":"data-sources/cosmic","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic","sidebar":"docs"},{"id":"data-sources/cosmic-cancer-gene-census","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-census"},{"id":"data-sources/cosmic-gene-fusion-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-json"},{"id":"data-sources/cosmic-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-json"},{"id":"data-sources/dann","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann","sidebar":"docs"},{"id":"data-sources/dann-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-json"},{"id":"data-sources/dbsnp","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp","sidebar":"docs"},{"id":"data-sources/dbsnp-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-json"},{"id":"data-sources/decipher","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher","sidebar":"docs"},{"id":"data-sources/decipher-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-json"},{"id":"data-sources/fusioncatcher","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher","sidebar":"docs"},{"id":"data-sources/fusioncatcher-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-json"},{"id":"data-sources/gerp","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp","sidebar":"docs"},{"id":"data-sources/gerp-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-json"},{"id":"data-sources/gme","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme","sidebar":"docs"},{"id":"data-sources/gme-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-json"},{"id":"data-sources/gnomad","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad","sidebar":"docs"},{"id":"data-sources/gnomad-lof-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-json"},{"id":"data-sources/gnomad-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-json"},{"id":"data-sources/gnomad-structural-variants-data_description","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_description"},{"id":"data-sources/gnomad-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-json"},{"id":"data-sources/mito-heteroplasmy","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmy","sidebar":"docs"},{"id":"data-sources/mitomap","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap","sidebar":"docs"},{"id":"data-sources/mitomap-small-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-json"},{"id":"data-sources/mitomap-structural-variants-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-json"},{"id":"data-sources/omim","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim","sidebar":"docs"},{"id":"data-sources/omim-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-json"},{"id":"data-sources/phylop","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop","sidebar":"docs"},{"id":"data-sources/phylop-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-json"},{"id":"data-sources/primate-ai","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai","sidebar":"docs"},{"id":"data-sources/primate-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-json"},{"id":"data-sources/revel","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel","sidebar":"docs"},{"id":"data-sources/revel-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-json"},{"id":"data-sources/splice-ai","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai","sidebar":"docs"},{"id":"data-sources/splice-ai-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-json"},{"id":"data-sources/topmed","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed","sidebar":"docs"},{"id":"data-sources/topmed-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-json"},{"id":"file-formats/custom-annotations","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotations","sidebar":"docs"},{"id":"file-formats/illumina-annotator-json-file-format","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-format","sidebar":"docs"},{"id":"introduction/dependencies","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependencies","sidebar":"docs"},{"id":"introduction/getting-started","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-started","sidebar":"docs"},{"id":"introduction/introduction","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/","sidebar":"docs"},{"id":"introduction/parsing-json","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-json","sidebar":"docs"},{"id":"utilities/jasix","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasix","sidebar":"docs"},{"id":"utilities/sautils","path":"/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautils","sidebar":"docs"}]}]}}}'),i=JSON.parse('{"defaultLocale":"en","locales":["en"],"currentLocale":"en","localeConfigs":{"en":{"label":"English","direction":"ltr"}}}');var s=t(7529);const c=JSON.parse('{"docusaurusVersion":"2.0.0-beta.13","siteVersion":"0.0.0","pluginVersions":{"docusaurus-plugin-content-docs":{"type":"package","name":"@docusaurus/plugin-content-docs","version":"2.0.0-beta.13"},"docusaurus-plugin-content-blog":{"type":"package","name":"@docusaurus/plugin-content-blog","version":"2.0.0-beta.13"},"docusaurus-plugin-content-pages":{"type":"package","name":"@docusaurus/plugin-content-pages","version":"2.0.0-beta.13"},"docusaurus-plugin-sitemap":{"type":"package","name":"@docusaurus/plugin-sitemap","version":"2.0.0-beta.13"},"docusaurus-theme-classic":{"type":"package","name":"@docusaurus/theme-classic","version":"2.0.0-beta.13"},"docusaurus-theme-search-algolia":{"type":"package","name":"@docusaurus/theme-search-algolia","version":"2.0.0-beta.13"}}}'),l={siteConfig:a.default,siteMetadata:c,globalData:r,i18n:i,codeTranslations:s},u=o.createContext(l);function d(e){let{children:n}=e;return o.createElement(u.Provider,{value:l},n)}},3919:(e,n,t)=>{"use strict";function o(e){return!0===/^(\w*:|\/\/)/.test(e)}function a(e){return void 0!==e&&!o(e)}t.d(n,{Z:()=>a,b:()=>o})},6291:(e,n,t)=>{"use strict";t.d(n,{Z:()=>o});const o=t(8790).H},8143:(e,n,t)=>{"use strict";t.r(n),t.d(n,{BrowserRouter:()=>o.VK,HashRouter:()=>o.UT,Link:()=>o.rU,MemoryRouter:()=>o.VA,NavLink:()=>o.OL,Prompt:()=>o.NL,Redirect:()=>o.l_,Route:()=>o.AW,Router:()=>o.F0,StaticRouter:()=>o.gx,Switch:()=>o.rs,generatePath:()=>o.Gn,matchPath:()=>o.LX,useHistory:()=>o.k6,useLocation:()=>o.TH,useParams:()=>o.UO,useRouteMatch:()=>o.$B,withRouter:()=>o.EN});var o=t(3727)},4996:(e,n,t)=>{"use strict";t.d(n,{C:()=>r,Z:()=>i});var o=t(2263),a=t(3919);function r(){const{siteConfig:{baseUrl:e="/",url:n}={}}=(0,o.Z)();return{withBaseUrl:(t,o)=>function(e,n,t,o){let{forcePrependBaseUrl:r=!1,absolute:i=!1}=void 0===o?{}:o;if(!t)return t;if(t.startsWith("#"))return t;if((0,a.b)(t))return t;if(r)return n+t;const s=t.startsWith(n)?t:n+t.replace(/^\//,"");return i?e+s:s}(n,e,t,o)}}function i(e,n){void 0===n&&(n={});const{withBaseUrl:t}=r();return t(e,n)}},2263:(e,n,t)=>{"use strict";t.d(n,{Z:()=>r});var o=t(7294),a=t(7041);const r=function(){return(0,o.useContext)(a._)}},8084:(e,n,t)=>{"use strict";t.r(n),t.d(n,{default:()=>r,useAllPluginInstancesData:()=>i,usePluginData:()=>s});var o=t(2263),a=t(9935);function r(){const{globalData:e}=(0,o.Z)();if(!e)throw new Error("Docusaurus global data not found.");return e}function i(e){const n=r()[e];if(!n)throw new Error(`Docusaurus plugin global data not found for "${e}" plugin.`);return n}function s(e,n){void 0===n&&(n=a.m);const t=i(e)[n];if(!t)throw new Error(`Docusaurus plugin global data not found for "${e}" plugin with id "${n}".`);return t}},2389:(e,n,t)=>{"use strict";t.d(n,{Z:()=>r});var o=t(7294),a=t(9913);function r(){return(0,o.useContext)(a._)}},4953:(e,n,t)=>{"use strict";t.d(n,{Z:()=>s});var o=t(7294),a=t(8882),r=t(780);function i(e){let{error:n,tryAgain:t}=e;return o.createElement("div",{style:{display:"flex",flexDirection:"column",justifyContent:"center",alignItems:"center",height:"50vh",width:"100%",fontSize:"20px"}},o.createElement("h1",null,"This page crashed."),o.createElement("p",null,n.message),o.createElement("button",{type:"button",onClick:t},"Try again"))}const s=function(e){let{error:n,tryAgain:t}=e;return o.createElement(r.Z,{fallback:()=>o.createElement(i,{error:n,tryAgain:t})},o.createElement(a.Z,{title:"Page Error"},o.createElement(i,{error:n,tryAgain:t})))}},8408:(e,n,t)=>{"use strict";Object.defineProperty(n,"__esModule",{value:!0}),n.getDocVersionSuggestions=n.getActiveDocContext=n.getActiveVersion=n.getLatestVersion=n.getActivePlugin=void 0;const o=t(8143);n.getActivePlugin=function(e,n,t){void 0===t&&(t={});const a=Object.entries(e).find((e=>{let[t,a]=e;return!!(0,o.matchPath)(n,{path:a.path,exact:!1,strict:!1})})),r=a?{pluginId:a[0],pluginData:a[1]}:void 0;if(!r&&t.failfast)throw new Error(`Can't find active docs plugin for "${n}" pathname, while it was expected to be found. Maybe you tried to use a docs feature that can only be used on a docs-related page? Existing docs plugin paths are: ${Object.values(e).map((e=>e.path)).join(", ")}`);return r};n.getLatestVersion=e=>e.versions.find((e=>e.isLast));n.getActiveVersion=(e,t)=>{const a=(0,n.getLatestVersion)(e);return[...e.versions.filter((e=>e!==a)),a].find((e=>!!(0,o.matchPath)(t,{path:e.path,exact:!1,strict:!1})))};n.getActiveDocContext=(e,t)=>{const a=(0,n.getActiveVersion)(e,t),r=null==a?void 0:a.docs.find((e=>!!(0,o.matchPath)(t,{path:e.path,exact:!0,strict:!1})));return{activeVersion:a,activeDoc:r,alternateDocVersions:r?function(n){const t={};return e.versions.forEach((e=>{e.docs.forEach((o=>{o.id===n&&(t[e.name]=o)}))})),t}(r.id):{}}};n.getDocVersionSuggestions=(e,t)=>{const o=(0,n.getLatestVersion)(e),a=(0,n.getActiveDocContext)(e,t);return{latestDocSuggestion:null==a?void 0:a.alternateDocVersions[o.name],latestVersionSuggestion:o}}},6730:(e,n,t)=>{"use strict";n.Jo=n.Iw=n.zu=n.yW=n.gB=n.WS=n.gA=n.zh=n._r=void 0;const o=t(7582),a=t(8143),r=(0,o.__importStar)(t(8084)),i=t(8408),s={};n._r=()=>{var e;return null!==(e=(0,r.default)()["docusaurus-plugin-content-docs"])&&void 0!==e?e:s};n.zh=e=>(0,r.usePluginData)("docusaurus-plugin-content-docs",e);n.gA=function(e){void 0===e&&(e={});const t=(0,n._r)(),{pathname:o}=(0,a.useLocation)();return(0,i.getActivePlugin)(t,o,e)};n.WS=function(e){void 0===e&&(e={});const t=(0,n.gA)(e),{pathname:o}=(0,a.useLocation)();if(t){return{activePlugin:t,activeVersion:(0,i.getActiveVersion)(t.pluginData,o)}}};n.gB=e=>(0,n.zh)(e).versions;n.yW=e=>{const t=(0,n.zh)(e);return(0,i.getLatestVersion)(t)};n.zu=e=>{const t=(0,n.zh)(e),{pathname:o}=(0,a.useLocation)();return(0,i.getActiveVersion)(t,o)};n.Iw=e=>{const t=(0,n.zh)(e),{pathname:o}=(0,a.useLocation)();return(0,i.getActiveDocContext)(t,o)};n.Jo=e=>{const t=(0,n.zh)(e),{pathname:o}=(0,a.useLocation)();return(0,i.getDocVersionSuggestions)(t,o)}},541:(e,n,t)=>{"use strict";t.d(n,{Z:()=>r});var o=t(7294);const a="iconExternalLink_wgqa";const r=function(e){let{width:n=13.5,height:t=13.5}=e;return o.createElement("svg",{width:n,height:t,"aria-hidden":"true",viewBox:"0 0 24 24",className:a},o.createElement("path",{fill:"currentColor",d:"M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"}))}},8882:(e,n,t)=>{"use strict";t.d(n,{Z:()=>Ee});var o=t(7294),a=t(6010),r=t(780),i=t(6550),s=t(5999),c=t(3810);const l="skipToContent_OuoZ";function u(e){e.setAttribute("tabindex","-1"),e.focus(),e.removeAttribute("tabindex")}const d=function(){const e=(0,o.useRef)(null),{action:n}=(0,i.k6)();return(0,c.SL)((t=>{let{location:o}=t;e.current&&!o.hash&&"PUSH"===n&&u(e.current)})),o.createElement("div",{ref:e},o.createElement("a",{href:"#",className:l,onClick:e=>{e.preventDefault();const n=document.querySelector("main:first-of-type")||document.querySelector(".main-wrapper");n&&u(n)}},o.createElement(s.Z,{id:"theme.common.skipToMainContent",description:"The skip to content label used for accessibility, allowing to rapidly navigate to main content with keyboard tab/enter navigation"},"Skip to main content")))};var m=t(7462);function p(e){let{width:n=21,height:t=21,color:a="currentColor",strokeWidth:r=1.2,className:i,...s}=e;return o.createElement("svg",(0,m.Z)({viewBox:"0 0 15 15",width:n,height:t},s),o.createElement("g",{stroke:a,strokeWidth:r},o.createElement("path",{d:"M.75.75l13.5 13.5M14.25.75L.75 14.25"})))}const f="announcementBar_axC9",h="announcementBarPlaceholder_xYHE",g="announcementBarClose_A3A1",v="announcementBarContent_6uhP";const b=function(){const{isActive:e,close:n}=(0,c.nT)(),{announcementBar:t}=(0,c.LU)();if(!e)return null;const{content:r,backgroundColor:i,textColor:l,isCloseable:u}=t;return o.createElement("div",{className:f,style:{backgroundColor:i,color:l},role:"banner"},u&&o.createElement("div",{className:h}),o.createElement("div",{className:v,dangerouslySetInnerHTML:{__html:r}}),u?o.createElement("button",{type:"button",className:(0,a.Z)("clean-btn close",g),onClick:n,"aria-label":(0,s.I)({id:"theme.AnnouncementBar.closeButtonAriaLabel",message:"Close",description:"The ARIA label for close button of announcement bar"})},o.createElement(p,{width:14,height:14,strokeWidth:3.1})):null)};var y=t(9166),w=t(2389);const C="toggle_iYfV",A="toggleScreenReader_h9qa",k="toggleDisabled_xj38",D="toggleTrack_t-f2",E="toggleTrackCheck_mk7D",I="toggleChecked_a04y",S="toggleTrackX_dm8H",x="toggleTrackThumb_W6To",_="toggleFocused_pRSw",j="toggleIcon_pHJ9",T=(0,o.memo)((e=>{let{className:n,switchConfig:t,checked:r,disabled:i,onChange:s}=e;const{darkIcon:c,darkIconStyle:l,lightIcon:u,lightIconStyle:d}=t,[m,p]=(0,o.useState)(r),[f,h]=(0,o.useState)(!1),g=(0,o.useRef)(null);return o.createElement("div",{className:(0,a.Z)(C,n,{[I]:m,[_]:f,[k]:i})},o.createElement("div",{className:D,role:"button",tabIndex:-1,onClick:()=>g.current?.click()},o.createElement("div",{className:E},o.createElement("span",{className:j,style:l},c)),o.createElement("div",{className:S},o.createElement("span",{className:j,style:d},u)),o.createElement("div",{className:x})),o.createElement("input",{ref:g,checked:m,type:"checkbox",className:A,"aria-label":"Switch between dark and light mode",onChange:s,onClick:()=>p(!m),onFocus:()=>h(!0),onBlur:()=>h(!1),onKeyDown:e=>{"Enter"===e.key&&g.current?.click()}}))}));function O(e){const{colorMode:{switchConfig:n}}=(0,c.LU)(),t=(0,w.Z)();return o.createElement(T,(0,m.Z)({switchConfig:n,disabled:!t},e))}var P=t(5350);const L=e=>{const[n,t]=(0,o.useState)(e),a=(0,o.useRef)(!1),r=(0,o.useRef)(0),i=(0,o.useCallback)((e=>{null!==e&&(r.current=e.getBoundingClientRect().height)}),[]);return(0,c.RF)(((n,o)=>{if(!e)return;const i=n.scrollY;if(i=s?t(!1):i+l{if(e)return n.location.hash?(a.current=!0,void t(!1)):void t(!0)})),{navbarRef:i,isNavbarVisible:n}};const R=function(e){void 0===e&&(e=!0),(0,o.useEffect)((()=>(document.body.style.overflow=e?"hidden":"visible",()=>{document.body.style.overflow="visible"})),[e])};var N=t(3783),M=t(907),F=t(2207),B=t(5537);const U=function(e){let{width:n=30,height:t=30,className:a,...r}=e;return o.createElement("svg",(0,m.Z)({className:a,width:n,height:t,viewBox:"0 0 30 30","aria-hidden":"true"},r),o.createElement("path",{stroke:"currentColor",strokeLinecap:"round",strokeMiterlimit:"10",strokeWidth:"2",d:"M4 7h22M4 15h22M4 23h22"}))},z={toggle:"toggle_2i4l",navbarHideable:"navbarHideable_RReh",navbarHidden:"navbarHidden_FBwS",navbarSidebarToggle:"navbarSidebarToggle_AVbO"},$="right";function G(){return(0,c.LU)().navbar.items}function q(){const{colorMode:{disableSwitch:e}}=(0,c.LU)(),{isDarkTheme:n,setLightTheme:t,setDarkTheme:a}=(0,P.Z)();return{isDarkTheme:n,toggle:(0,o.useCallback)((e=>e.target.checked?a():t()),[t,a]),disabled:e}}function Z(e){let{sidebarShown:n,toggleSidebar:t}=e;R(n);const r=G(),i=q(),l=function(e){let{sidebarShown:n,toggleSidebar:t}=e;const a=(0,c.g8)()?.({toggleSidebar:t}),r=(0,c.D9)(a),[i,s]=(0,o.useState)((()=>!1));(0,o.useEffect)((()=>{a&&!r&&s(!0)}),[a,r]);const l=!!a;return(0,o.useEffect)((()=>{l?n||s(!0):s(!1)}),[n,l]),{shown:i,hide:(0,o.useCallback)((()=>{s(!1)}),[]),content:a}}({sidebarShown:n,toggleSidebar:t});return o.createElement("div",{className:"navbar-sidebar"},o.createElement("div",{className:"navbar-sidebar__brand"},o.createElement(B.Z,{className:"navbar__brand",imageClassName:"navbar__logo",titleClassName:"navbar__title"}),!i.disabled&&o.createElement(O,{className:z.navbarSidebarToggle,checked:i.isDarkTheme,onChange:i.toggle}),o.createElement("button",{type:"button",className:"clean-btn navbar-sidebar__close",onClick:t},o.createElement(p,{color:"var(--ifm-color-emphasis-600)",className:z.navbarSidebarCloseSvg}))),o.createElement("div",{className:(0,a.Z)("navbar-sidebar__items",{"navbar-sidebar__items--show-secondary":l.shown})},o.createElement("div",{className:"navbar-sidebar__item menu"},o.createElement("ul",{className:"menu__list"},r.map(((e,n)=>o.createElement(F.Z,(0,m.Z)({mobile:!0},e,{onClick:t,key:n})))))),o.createElement("div",{className:"navbar-sidebar__item menu"},r.length>0&&o.createElement("button",{type:"button",className:"clean-btn navbar-sidebar__back",onClick:l.hide},o.createElement(s.Z,{id:"theme.navbar.mobileSidebarSecondaryMenu.backButtonLabel",description:"The label of the back button to return to main menu, inside the mobile navbar sidebar secondary menu (notably used to display the docs sidebar)"},"\u2190 Back to main menu")),l.content)))}const H=function(){const{navbar:{hideOnScroll:e,style:n}}=(0,c.LU)(),t=function(){const e=(0,N.Z)(),n="mobile"===e,[t,a]=(0,o.useState)(!1);(0,c.Rb)((()=>{if(t)return a(!1),!1}));const r=(0,o.useCallback)((()=>{a((e=>!e))}),[]);return(0,o.useEffect)((()=>{"desktop"===e&&a(!1)}),[e]),{shouldRender:n,toggle:r,shown:t}}(),r=q(),i=(0,M.gA)(),{navbarRef:s,isNavbarVisible:l}=L(e),u=G(),d=u.some((e=>"search"===e.type)),{leftItems:p,rightItems:f}=function(e){return{leftItems:e.filter((e=>"left"===(e.position??$))),rightItems:e.filter((e=>"right"===(e.position??$)))}}(u);return o.createElement("nav",{ref:s,className:(0,a.Z)("navbar","navbar--fixed-top",{"navbar--dark":"dark"===n,"navbar--primary":"primary"===n,"navbar-sidebar--show":t.shown,[z.navbarHideable]:e,[z.navbarHidden]:e&&!l})},o.createElement("div",{className:"navbar__inner"},o.createElement("div",{className:"navbar__items"},(u?.length>0||i)&&o.createElement("button",{"aria-label":"Navigation bar toggle",className:"navbar__toggle clean-btn",type:"button",tabIndex:0,onClick:t.toggle,onKeyDown:t.toggle},o.createElement(U,null)),o.createElement(B.Z,{className:"navbar__brand",imageClassName:"navbar__logo",titleClassName:"navbar__title"}),p.map(((e,n)=>o.createElement(F.Z,(0,m.Z)({},e,{key:n}))))),o.createElement("div",{className:"navbar__items navbar__items--right"},f.map(((e,n)=>o.createElement(F.Z,(0,m.Z)({},e,{key:n})))),!r.disabled&&o.createElement(O,{className:z.toggle,checked:r.isDarkTheme,onChange:r.toggle}),!d&&o.createElement(y.Z,null))),o.createElement("div",{role:"presentation",className:"navbar-sidebar__backdrop",onClick:t.toggle}),t.shouldRender&&o.createElement(Z,{sidebarShown:t.shown,toggleSidebar:t.toggle}))};var V=t(9960),W=t(4996),K=t(3919);const Y="footerLogoLink_SRtH";var Q=t(9750),X=t(541);function J(e){let{to:n,href:t,label:a,prependBaseUrlToHref:r,...i}=e;const s=(0,W.Z)(n),c=(0,W.Z)(t,{forcePrependBaseUrl:!0});return o.createElement(V.Z,(0,m.Z)({className:"footer__link-item"},t?{href:r?c:t}:{to:s},i),t&&!(0,K.Z)(t)?o.createElement("span",null,a,o.createElement(X.Z,null)):a)}function ee(e){let{sources:n,alt:t,width:a,height:r}=e;return o.createElement(Q.Z,{className:"footer__logo",alt:t,sources:n,width:a,height:r})}const ne=function(){const{footer:e}=(0,c.LU)(),{copyright:n,links:t=[],logo:r={}}=e||{},i={light:(0,W.Z)(r.src),dark:(0,W.Z)(r.srcDark||r.src)};return e?o.createElement("footer",{className:(0,a.Z)("footer",{"footer--dark":"dark"===e.style})},o.createElement("div",{className:"container"},t&&t.length>0&&o.createElement("div",{className:"row footer__links"},t.map(((e,n)=>o.createElement("div",{key:n,className:"col footer__col"},null!=e.title?o.createElement("div",{className:"footer__title"},e.title):null,null!=e.items&&Array.isArray(e.items)&&e.items.length>0?o.createElement("ul",{className:"footer__items"},e.items.map(((e,n)=>e.html?o.createElement("li",{key:n,className:"footer__item",dangerouslySetInnerHTML:{__html:e.html}}):o.createElement("li",{key:e.href||e.to,className:"footer__item"},o.createElement(J,e))))):null)))),(r||n)&&o.createElement("div",{className:"footer__bottom text--center"},r&&(r.src||r.srcDark)&&o.createElement("div",{className:"margin-bottom--sm"},r.href?o.createElement(V.Z,{href:r.href,className:Y},o.createElement(ee,{alt:r.alt,sources:i,width:r.width,height:r.height})):o.createElement(ee,{alt:r.alt,sources:i})),n?o.createElement("div",{className:"footer__copyright",dangerouslySetInnerHTML:{__html:n}}):null))):null};var te=t(412);const oe=(0,c.WA)("theme"),ae="light",re="dark",ie=e=>e===re?re:ae,se=e=>{(0,c.WA)("theme").set(ie(e))},ce=()=>{const{colorMode:{defaultMode:e,disableSwitch:n,respectPrefersColorScheme:t}}=(0,c.LU)(),[a,r]=(0,o.useState)((e=>te.Z.canUseDOM?ie(document.documentElement.getAttribute("data-theme")):ie(e))(e)),i=(0,o.useCallback)((()=>{r(ae),se(ae)}),[]),s=(0,o.useCallback)((()=>{r(re),se(re)}),[]);return(0,o.useEffect)((()=>{document.documentElement.setAttribute("data-theme",ie(a))}),[a]),(0,o.useEffect)((()=>{if(!n)try{const e=oe.get();null!==e&&r(ie(e))}catch(e){console.error(e)}}),[n,r]),(0,o.useEffect)((()=>{n&&!t||window.matchMedia("(prefers-color-scheme: dark)").addListener((e=>{let{matches:n}=e;r(n?re:ae)}))}),[n,t]),{isDarkTheme:a===re,setLightTheme:i,setDarkTheme:s}};var le=t(2924);const ue=function(e){const{isDarkTheme:n,setLightTheme:t,setDarkTheme:a}=ce(),r=(0,o.useMemo)((()=>({isDarkTheme:n,setLightTheme:t,setDarkTheme:a})),[n,t,a]);return o.createElement(le.Z.Provider,{value:r},e.children)},de="docusaurus.tab.",me=()=>{const[e,n]=(0,o.useState)({}),t=(0,o.useCallback)(((e,n)=>{(0,c.WA)(`${de}${e}`).set(n)}),[]);return(0,o.useEffect)((()=>{try{const e={};(0,c._f)().forEach((n=>{if(n.startsWith(de)){const t=n.substring(15);e[t]=(0,c.WA)(n).get()}})),n(e)}catch(e){console.error(e)}}),[]),{tabGroupChoices:e,setTabGroupChoices:(e,o)=>{n((n=>({...n,[e]:o}))),t(e,o)}}},pe=(0,o.createContext)(void 0);const fe=function(e){const{tabGroupChoices:n,setTabGroupChoices:t}=me(),a=(0,o.useMemo)((()=>({tabGroupChoices:n,setTabGroupChoices:t})),[n,t]);return o.createElement(pe.Provider,{value:a},e.children)};function he(e){let{children:n}=e;return o.createElement(ue,null,o.createElement(c.pl,null,o.createElement(fe,null,o.createElement(c.OC,null,o.createElement(c.L5,null,o.createElement(c.Cn,null,n))))))}var ge=t(2859),ve=t(2263);const be=function(e){let{locale:n,version:t,tag:a}=e;const r=n;return o.createElement(ge.Z,null,r&&o.createElement("meta",{name:"docsearch:language",content:r}),t&&o.createElement("meta",{name:"docsearch:version",content:t}),a&&o.createElement("meta",{name:"docsearch:docusaurus_tag",content:a}))};var ye=t(1217);function we(){const{i18n:{defaultLocale:e,locales:n}}=(0,ve.Z)(),t=(0,c.l5)();return o.createElement(ge.Z,null,n.map((e=>o.createElement("link",{key:e,rel:"alternate",href:t.createUrl({locale:e,fullyQualified:!0}),hrefLang:e}))),o.createElement("link",{rel:"alternate",href:t.createUrl({locale:e,fullyQualified:!0}),hrefLang:"x-default"}))}function Ce(e){let{permalink:n}=e;const{siteConfig:{url:t}}=(0,ve.Z)(),a=function(){const{siteConfig:{url:e}}=(0,ve.Z)(),{pathname:n}=(0,i.TH)();return e+(0,W.Z)(n)}(),r=n?`${t}${n}`:a;return o.createElement(ge.Z,null,o.createElement("meta",{property:"og:url",content:r}),o.createElement("link",{rel:"canonical",href:r}))}function Ae(e){const{siteConfig:{favicon:n},i18n:{currentLocale:t,localeConfigs:a}}=(0,ve.Z)(),{metadata:r,image:i}=(0,c.LU)(),{title:s,description:l,image:u,keywords:d,searchMetadata:p}=e,f=(0,W.Z)(n),h=(0,c.pe)(s),g=t,v=a[t].direction;return o.createElement(o.Fragment,null,o.createElement(ge.Z,null,o.createElement("html",{lang:g,dir:v}),n&&o.createElement("link",{rel:"icon",href:f}),o.createElement("title",null,h),o.createElement("meta",{property:"og:title",content:h}),o.createElement("meta",{name:"twitter:card",content:"summary_large_image"})),i&&o.createElement(ye.Z,{image:i}),u&&o.createElement(ye.Z,{image:u}),o.createElement(ye.Z,{description:l,keywords:d}),o.createElement(Ce,null),o.createElement(we,null),o.createElement(be,(0,m.Z)({tag:c.HX,locale:t},p)),o.createElement(ge.Z,null,r.map(((e,n)=>o.createElement("meta",(0,m.Z)({key:`metadata_${n}`},e))))))}const ke=function(){(0,o.useEffect)((()=>{const e="navigation-with-keyboard";function n(n){"keydown"===n.type&&"Tab"===n.key&&document.body.classList.add(e),"mousedown"===n.type&&document.body.classList.remove(e)}return document.addEventListener("keydown",n),document.addEventListener("mousedown",n),()=>{document.body.classList.remove(e),document.removeEventListener("keydown",n),document.removeEventListener("mousedown",n)}}),[])};function De(e){let{error:n,tryAgain:t}=e;return o.createElement("main",{className:"container margin-vert--xl"},o.createElement("div",{className:"row"},o.createElement("div",{className:"col col--6 col--offset-3"},o.createElement("h1",{className:"hero__title"},o.createElement(s.Z,{id:"theme.ErrorPageContent.title",description:"The title of the fallback page when the page crashed"},"This page crashed.")),o.createElement("p",null,n.message),o.createElement("div",null,o.createElement("button",{type:"button",onClick:t},o.createElement(s.Z,{id:"theme.ErrorPageContent.tryAgain",description:"The label of the button to try again when the page crashed"},"Try again"))))))}const Ee=function(e){const{children:n,noFooter:t,wrapperClassName:i,pageClassName:s}=e;return ke(),o.createElement(he,null,o.createElement(Ae,e),o.createElement(d,null),o.createElement(b,null),o.createElement(H,null),o.createElement("div",{className:(0,a.Z)(c.kM.wrapper.main,i,s)},o.createElement(r.Z,{fallback:De},n)),!t&&o.createElement(ne,null))}},5537:(e,n,t)=>{"use strict";t.d(n,{Z:()=>u});var o=t(7462),a=t(7294),r=t(9960),i=t(9750),s=t(4996),c=t(2263),l=t(3810);const u=function(e){const{siteConfig:{title:n}}=(0,c.Z)(),{navbar:{title:t,logo:u={src:""}}}=(0,l.LU)(),{imageClassName:d,titleClassName:m,...p}=e,f=(0,s.Z)(u.href||"/"),h={light:(0,s.Z)(u.src),dark:(0,s.Z)(u.srcDark||u.src)},g=a.createElement(i.Z,{sources:h,height:u.height,width:u.width,alt:u.alt||t||n});return a.createElement(r.Z,(0,o.Z)({to:f},p,u.target&&{target:u.target}),u.src&&(d?a.createElement("div",{className:d},g):g),null!=t&&a.createElement("b",{className:m},t))}},5525:(e,n,t)=>{"use strict";t.d(n,{O:()=>p,Z:()=>g});var o=t(7462),a=t(7294),r=t(6010),i=t(9960),s=t(4996),c=t(541),l=t(3919),u=t(3810),d=t(2207);const m="dropdown__link--active";function p(e){let{activeBasePath:n,activeBaseRegex:t,to:r,href:d,label:p,activeClassName:f="",prependBaseUrlToHref:h,...g}=e;const v=(0,s.Z)(r),b=(0,s.Z)(n),y=(0,s.Z)(d,{forcePrependBaseUrl:!0}),w=p&&d&&!(0,l.Z)(d),C=f===m;return a.createElement(i.Z,(0,o.Z)({},d?{href:h?y:d}:{isNavLink:!0,activeClassName:g.className?.includes(f)?"":f,to:v,...n||t?{isActive:(e,n)=>t?(0,u.Fx)(t,n.pathname):n.pathname.startsWith(b)}:null},g),w?a.createElement("span",null,p,a.createElement(c.Z,C&&{width:12,height:12})):p)}function f(e){let{className:n,isDropdownItem:t=!1,...i}=e;const s=a.createElement(p,(0,o.Z)({className:(0,r.Z)(t?"dropdown__link":"navbar__item navbar__link",n)},i));return t?a.createElement("li",null,s):s}function h(e){let{className:n,isDropdownItem:t,...i}=e;return a.createElement("li",{className:"menu__list-item"},a.createElement(p,(0,o.Z)({className:(0,r.Z)("menu__link",n)},i)))}const g=function(e){let{mobile:n=!1,position:t,...r}=e;const i=n?h:f;return a.createElement(i,(0,o.Z)({},r,{activeClassName:r.activeClassName??(0,d.E)(n)}))}},6400:(e,n,t)=>{"use strict";t.d(n,{Z:()=>u});var o=t(7462),a=t(7294),r=t(5525),i=t(907),s=t(6010),c=t(2207),l=t(3810);function u(e){let{docId:n,label:t,docsPluginId:u,...d}=e;const{activeVersion:m,activeDoc:p}=(0,i.Iw)(u),{preferredVersion:f}=(0,l.J)(u),h=(0,i.yW)(u),g=function(e,n){const t=e.flatMap((e=>e.docs)),o=t.find((e=>e.id===n));if(!o){const o=t.map((e=>e.id)).join("\n- ");throw new Error(`DocNavbarItem: couldn't find any doc with id "${n}" in version${e.length?"s":""} ${e.map((e=>e.name)).join(", ")}".\nAvailable doc ids are:\n- ${o}`)}return o}((0,l.jj)([m,f,h].filter(Boolean)),n),v=(0,c.E)(d.mobile);return a.createElement(r.Z,(0,o.Z)({exact:!0},d,{className:(0,s.Z)(d.className,{[v]:p?.sidebar&&p.sidebar===g.sidebar}),activeClassName:v,label:t??g.id,to:g.path}))}},9308:(e,n,t)=>{"use strict";t.d(n,{Z:()=>d});var o=t(7462),a=t(7294),r=t(5525),i=t(3154),s=t(907),c=t(3810),l=t(5999);const u=e=>e.docs.find((n=>n.id===e.mainDocId));function d(e){let{mobile:n,docsPluginId:t,dropdownActiveClassDisabled:d,dropdownItemsBefore:m,dropdownItemsAfter:p,...f}=e;const h=(0,s.Iw)(t),g=(0,s.gB)(t),v=(0,s.yW)(t),{preferredVersion:b,savePreferredVersionName:y}=(0,c.J)(t);const w=function(){const e=g.map((e=>{const n=h?.alternateDocVersions[e.name]||u(e);return{isNavLink:!0,label:e.label,to:n.path,isActive:()=>e===h?.activeVersion,onClick:()=>{y(e.name)}}}));return[...m,...e,...p]}(),C=h.activeVersion??b??v,A=n&&w?(0,l.I)({id:"theme.navbar.mobileVersionsDropdown.label",message:"Versions",description:"The label for the navbar versions dropdown on mobile view"}):C.label,k=n&&w?void 0:u(C).path;return w.length<=1?a.createElement(r.Z,(0,o.Z)({},f,{mobile:n,label:A,to:k,isActive:d?()=>!1:void 0})):a.createElement(i.Z,(0,o.Z)({},f,{mobile:n,label:A,to:k,items:w,isActive:d?()=>!1:void 0}))}},7250:(e,n,t)=>{"use strict";t.d(n,{Z:()=>l});var o=t(7462),a=t(7294),r=t(5525),i=t(907),s=t(3810);const c=e=>e.docs.find((n=>n.id===e.mainDocId));function l(e){let{label:n,to:t,docsPluginId:l,...u}=e;const d=(0,i.zu)(l),{preferredVersion:m}=(0,s.J)(l),p=(0,i.yW)(l),f=d??m??p,h=n??f.label,g=t??c(f).path;return a.createElement(r.Z,(0,o.Z)({},u,{label:h,to:g}))}},3154:(e,n,t)=>{"use strict";t.d(n,{Z:()=>p});var o=t(7462),a=t(7294),r=t(6010),i=t(3810),s=t(5525),c=t(2207);const l="dropdown__link--active";function u(e,n){return e.some((e=>function(e,n){return!!(0,i.Mg)(e.to,n)||!!(0,i.Fx)(e.activeBaseRegex,n)||!(!e.activeBasePath||!n.startsWith(e.activeBasePath))}(e,n)))}function d(e){let{items:n,position:t,className:i,...u}=e;const d=(0,a.useRef)(null),m=(0,a.useRef)(null),[p,f]=(0,a.useState)(!1);return(0,a.useEffect)((()=>{const e=e=>{d.current&&!d.current.contains(e.target)&&f(!1)};return document.addEventListener("mousedown",e),document.addEventListener("touchstart",e),()=>{document.removeEventListener("mousedown",e),document.removeEventListener("touchstart",e)}}),[d]),a.createElement("div",{ref:d,className:(0,r.Z)("navbar__item","dropdown","dropdown--hoverable",{"dropdown--right":"right"===t,"dropdown--show":p})},a.createElement(s.O,(0,o.Z)({href:u.to?void 0:"#",className:(0,r.Z)("navbar__link",i)},u,{onClick:u.to?void 0:e=>e.preventDefault(),onKeyDown:e=>{"Enter"===e.key&&(e.preventDefault(),f(!p))}}),u.children??u.label),a.createElement("ul",{ref:m,className:"dropdown__menu"},n.map(((e,t)=>a.createElement(c.Z,(0,o.Z)({isDropdownItem:!0,onKeyDown:e=>{if(t===n.length-1&&"Tab"===e.key){e.preventDefault(),f(!1);const n=d.current.nextElementSibling;n&&n.focus()}},activeClassName:l},e,{key:t}))))))}function m(e){let{items:n,className:t,position:l,...d}=e;const m=(0,i.be)(),p=u(n,m),{collapsed:f,toggleCollapsed:h,setCollapsed:g}=(0,i.uR)({initialState:()=>!p});return(0,a.useEffect)((()=>{p&&g(!p)}),[m,p,g]),a.createElement("li",{className:(0,r.Z)("menu__list-item",{"menu__list-item--collapsed":f})},a.createElement(s.O,(0,o.Z)({role:"button",className:(0,r.Z)("menu__link menu__link--sublist",t)},d,{onClick:e=>{e.preventDefault(),h()}}),d.children??d.label),a.createElement(i.zF,{lazy:!0,as:"ul",className:"menu__list",collapsed:f},n.map(((e,n)=>a.createElement(c.Z,(0,o.Z)({mobile:!0,isDropdownItem:!0,onClick:d.onClick,activeClassName:"menu__link--active"},e,{key:n}))))))}const p=function(e){let{mobile:n=!1,...t}=e;const o=n?m:d;return a.createElement(o,t)}},2207:(e,n,t)=>{"use strict";t.d(n,{Z:()=>v,E:()=>g});var o=t(7294),a=t(5525),r=t(3154),i=t(7462);const s=function(e){let{width:n=20,height:t=20,...a}=e;return o.createElement("svg",(0,i.Z)({viewBox:"0 0 20 20",width:n,height:t,"aria-hidden":"true"},a),o.createElement("path",{fill:"currentColor",d:"M19.753 10.909c-.624-1.707-2.366-2.726-4.661-2.726-.09 0-.176.002-.262.006l-.016-2.063 3.525-.607c.115-.019.133-.119.109-.231-.023-.111-.167-.883-.188-.976-.027-.131-.102-.127-.207-.109-.104.018-3.25.461-3.25.461l-.013-2.078c-.001-.125-.069-.158-.194-.156l-1.025.016c-.105.002-.164.049-.162.148l.033 2.307s-3.061.527-3.144.543c-.084.014-.17.053-.151.143.019.09.19 1.094.208 1.172.018.08.072.129.188.107l2.924-.504.035 2.018c-1.077.281-1.801.824-2.256 1.303-.768.807-1.207 1.887-1.207 2.963 0 1.586.971 2.529 2.328 2.695 3.162.387 5.119-3.06 5.769-4.715 1.097 1.506.256 4.354-2.094 5.98-.043.029-.098.129-.033.207l.619.756c.08.096.206.059.256.023 2.51-1.73 3.661-4.515 2.869-6.683zm-7.386 3.188c-.966-.121-.944-.914-.944-1.453 0-.773.327-1.58.876-2.156a3.21 3.21 0 011.229-.799l.082 4.277a2.773 2.773 0 01-1.243.131zm2.427-.553l.046-4.109c.084-.004.166-.01.252-.01.773 0 1.494.145 1.885.361.391.217-1.023 2.713-2.183 3.758zm-8.95-7.668a.196.196 0 00-.196-.145h-1.95a.194.194 0 00-.194.144L.008 16.916c-.017.051-.011.076.062.076h1.733c.075 0 .099-.023.114-.072l1.008-3.318h3.496l1.008 3.318c.016.049.039.072.113.072h1.734c.072 0 .078-.025.062-.076-.014-.05-3.083-9.741-3.494-11.04zm-2.618 6.318l1.447-5.25 1.447 5.25H3.226z"}))};var c=t(2263),l=t(3810);const u="iconLanguage_EbrZ";function d(e){let{mobile:n,dropdownItemsBefore:t,dropdownItemsAfter:a,...d}=e;const{i18n:{currentLocale:m,locales:p,localeConfigs:f}}=(0,c.Z)(),h=(0,l.l5)();function g(e){return f[e].label}const v=[...t,...p.map((e=>{const n=`pathname://${h.createUrl({locale:e,fullyQualified:!1})}`;return{isNavLink:!0,label:g(e),to:n,target:"_self",autoAddBaseUrl:!1,className:e===m?"dropdown__link--active":""}})),...a],b=n?"Languages":g(m);return o.createElement(r.Z,(0,i.Z)({},d,{mobile:n,label:o.createElement("span",null,o.createElement(s,{className:u}),o.createElement("span",null,b)),items:v}))}var m=t(9166);function p(e){let{mobile:n}=e;return n?null:o.createElement(m.Z,null)}const f={default:()=>a.Z,localeDropdown:()=>d,search:()=>p,dropdown:()=>r.Z,docsVersion:()=>t(7250).Z,docsVersionDropdown:()=>t(9308).Z,doc:()=>t(6400).Z},h=e=>{const n=f[e];if(!n)throw new Error(`No NavbarItem component found for type "${e}".`);return n()};const g=e=>e?"menu__link--active":"navbar__link--active";function v(e){let{type:n,...t}=e;const a=function(e,n){return e&&"default"!==e?e:n?"dropdown":"default"}(n,void 0!==t.items),r=h(a);return o.createElement(r,t)}},1217:(e,n,t)=>{"use strict";t.d(n,{Z:()=>s});var o=t(7294),a=t(2859),r=t(3810),i=t(4996);function s(e){let{title:n,description:t,keywords:s,image:c,children:l}=e;const u=(0,r.pe)(n),{withBaseUrl:d}=(0,i.C)(),m=c?d(c,{absolute:!0}):void 0;return o.createElement(a.Z,null,n&&o.createElement("title",null,u),n&&o.createElement("meta",{property:"og:title",content:u}),t&&o.createElement("meta",{name:"description",content:t}),t&&o.createElement("meta",{property:"og:description",content:t}),s&&o.createElement("meta",{name:"keywords",content:Array.isArray(s)?s.join(","):s}),m&&o.createElement("meta",{property:"og:image",content:m}),m&&o.createElement("meta",{name:"twitter:image",content:m}),l)}},2924:(e,n,t)=>{"use strict";t.d(n,{Z:()=>o});const o=t(7294).createContext(void 0)},9750:(e,n,t)=>{"use strict";t.d(n,{Z:()=>l});var o=t(7462),a=t(7294),r=t(6010),i=t(2389),s=t(5350);const c={themedImage:"themedImage_TMUO","themedImage--light":"themedImage--light_4Vu1","themedImage--dark":"themedImage--dark_uzRr"};const l=function(e){const n=(0,i.Z)(),{isDarkTheme:t}=(0,s.Z)(),{sources:l,className:u,alt:d="",...m}=e,p=n?t?["dark"]:["light"]:["light","dark"];return a.createElement(a.Fragment,null,p.map((e=>a.createElement("img",(0,o.Z)({key:e,src:l[e],alt:d,className:(0,r.Z)(c.themedImage,c[`themedImage--${e}`],u)},m)))))}},907:(e,n,t)=>{"use strict";t.d(n,{Iw:()=>o.Iw,Jo:()=>o.Jo,WS:()=>o.WS,_r:()=>o._r,gA:()=>o.gA,gB:()=>o.gB,yW:()=>o.yW,zh:()=>o.zh,zu:()=>o.zu});var o=t(6730)},5350:(e,n,t)=>{"use strict";t.d(n,{Z:()=>r});var o=t(7294),a=t(2924);const r=function(){const e=(0,o.useContext)(a.Z);if(null==e)throw new Error('"useThemeContext" is used outside of "Layout" component. Please see https://docusaurus.io/docs/api/themes/configuration#usethemecontext.');return e}},3783:(e,n,t)=>{"use strict";t.d(n,{Z:()=>c});var o=t(7294),a=t(412);const r={desktop:"desktop",mobile:"mobile",ssr:"ssr"},i=996;function s(){return a.Z.canUseDOM?window.innerWidth>i?r.desktop:r.mobile:r.ssr}const c=function(){const[e,n]=(0,o.useState)((()=>s()));return(0,o.useEffect)((()=>{function e(){n(s())}return window.addEventListener("resize",e),()=>{window.removeEventListener("resize",e),clearTimeout(undefined)}}),[]),e}},467:(e,n,t)=>{"use strict";t.r(n),t.d(n,{default:()=>r});var o=t(412),a=t(9782);const r=e=>{if(o.Z.canUseDOM){const{themeConfig:{prism:n={}}}=a.default,{additionalLanguages:o=[]}=n;window.Prism=e,o.forEach((e=>{t(6726)(`./prism-${e}`)})),delete window.Prism}}},2448:(e,n,t)=>{"use strict";var o=a(t(7410));function a(e){return e&&e.__esModule?e:{default:e}}(0,a(t(467)).default)(o.default)},3810:(e,n,t)=>{"use strict";t.d(n,{pl:()=>Ue,zF:()=>be,HX:()=>M,PO:()=>Ie,L5:()=>T,bT:()=>A,qu:()=>y,Cv:()=>Te,Cn:()=>xe,OC:()=>Xe,kM:()=>Pe,WA:()=>l,os:()=>F,Wl:()=>D,_F:()=>E,Fx:()=>tn,Mg:()=>h,_f:()=>u,bc:()=>K,Vo:()=>Y,nZ:()=>Q,jj:()=>Oe,l5:()=>m,nT:()=>ze,uR:()=>ue,_q:()=>B,J:()=>R,Vq:()=>k,E6:()=>w,ed:()=>re,Rb:()=>Ge,be:()=>$e,SL:()=>se,g8:()=>je,c2:()=>oe,D9:()=>ie,RF:()=>nn,DA:()=>Ke,Si:()=>Ve,LU:()=>a,pe:()=>X});var o=t(2263);function a(){return(0,o.Z)().siteConfig.themeConfig}const r="localStorage";function i(e){if(void 0===e&&(e=r),"undefined"==typeof window)throw new Error("Browser storage is not available on Node.js/Docusaurus SSR process.");if("none"===e)return null;try{return window[e]}catch(t){return n=t,s||(console.warn("Docusaurus browser storage is not available.\nPossible reasons: running Docusaurus in an iframe, in an incognito browser session, or using too strict browser privacy settings.",n),s=!0),null}var n}let s=!1;const c={get:()=>null,set:()=>{},del:()=>{}};const l=(e,n)=>{if("undefined"==typeof window)return function(e){function n(){throw new Error(`Illegal storage API usage for storage key "${e}".\nDocusaurus storage APIs are not supposed to be called on the server-rendering process.\nPlease only call storage APIs in effects and event handlers.`)}return{get:n,set:n,del:n}}(e);const t=i(null==n?void 0:n.persistence);return null===t?c:{get:()=>{try{return t.getItem(e)}catch(n){return console.error(`Docusaurus storage error, can't get key=${e}`,n),null}},set:n=>{try{t.setItem(e,n)}catch(o){console.error(`Docusaurus storage error, can't set ${e}=${n}`,o)}},del:()=>{try{t.removeItem(e)}catch(n){console.error(`Docusaurus storage error, can't delete key=${e}`,n)}}}};function u(e){void 0===e&&(e=r);const n=i(e);if(!n)return[];const t=[];for(let o=0;o{const t=e=>!e||(null==e?void 0:e.endsWith("/"))?e:`${e}/`;return t(e)===t(n)},g=!!p._r,v=Symbol("EmptyContext"),b=(0,f.createContext)(v);function y(e){let{children:n,version:t}=e;return f.createElement(b.Provider,{value:t},n)}function w(){const e=(0,f.useContext)(b);if(e===v)throw new Error("This hook requires usage of ");return e}const C=(0,f.createContext)(v);function A(e){let{children:n,sidebar:t}=e;return f.createElement(C.Provider,{value:t},n)}function k(){const e=(0,f.useContext)(C);if(e===v)throw new Error("This hook requires usage of ");return e}function D(e){if(e.href)return e.href;for(const n of e.items){if("link"===n.type)return n.href;if("category"!==n.type)throw new Error(`Unexpected category item type for ${JSON.stringify(n)}`);{const e=D(n);if(e)return e}}}function E(e,n){const t=e=>void 0!==e&&h(e,n);return"link"===e.type?t(e.href):"category"===e.type&&(t(e.href)||function(e,n){return e.some((e=>E(e,n)))}(e.items,n))}const I=e=>`docs-preferred-version-${e}`,S={save:(e,n,t)=>{l(I(e),{persistence:n}).set(t)},read:(e,n)=>l(I(e),{persistence:n}).get(),clear:(e,n)=>{l(I(e),{persistence:n}).del()}};function x(e){let{pluginIds:n,versionPersistence:t,allDocsData:o}=e;const a={};return n.forEach((e=>{a[e]=function(e){const n=S.read(e,t);return o[e].versions.some((e=>e.name===n))?{preferredVersionName:n}:(S.clear(e,t),{preferredVersionName:null})}(e)})),a}function _(){const e=(0,p._r)(),n=a().docs.versionPersistence,t=(0,f.useMemo)((()=>Object.keys(e)),[e]),[o,r]=(0,f.useState)((()=>function(e){const n={};return e.forEach((e=>{n[e]={preferredVersionName:null}})),n}(t)));(0,f.useEffect)((()=>{r(x({allDocsData:e,versionPersistence:n,pluginIds:t}))}),[e,n,t]);return[o,(0,f.useMemo)((()=>({savePreferredVersion:function(e,t){S.save(e,n,t),r((n=>({...n,[e]:{preferredVersionName:t}})))}})),[n])]}const j=(0,f.createContext)(null);function T(e){let{children:n}=e;return g?f.createElement(O,null,n):n}function O(e){let{children:n}=e;const t=_();return f.createElement(j.Provider,{value:t},n)}function P(){const e=(0,f.useContext)(j);if(!e)throw new Error('Can\'t find docs preferred context, maybe you forgot to use the "DocsPreferredVersionContextProvider"?');return e}var L=t(9935);function R(e){void 0===e&&(e=L.m);const n=(0,p.zh)(e),[t,o]=P(),{preferredVersionName:a}=t[e];return{preferredVersion:a?n.versions.find((e=>e.name===a)):null,savePreferredVersionName:(0,f.useCallback)((n=>{o.savePreferredVersion(e,n)}),[o,e])}}function N(){const e=(0,p._r)(),[n]=P();const t=Object.keys(e),o={};return t.forEach((t=>{o[t]=function(t){const o=e[t],{preferredVersionName:a}=n[t];return a?o.versions.find((e=>e.name===a)):null}(t)})),o}const M="default";function F(e,n){return`docs-${e}-${n}`}function B(){const{i18n:e}=(0,o.Z)(),n=(0,p._r)(),t=(0,p.WS)(),a=N();const r=[M,...Object.keys(n).map((function(e){var o,r;const i=(null===(o=null==t?void 0:t.activePlugin)||void 0===o?void 0:o.pluginId)===e?t.activeVersion:void 0,s=a[e],c=n[e].versions.find((e=>e.isLast));return F(e,(null!==(r=null!=i?i:s)&&void 0!==r?r:c).name)}))];return{locale:e.currentLocale,tags:r}}var U=t(7594),z=t.n(U);const $=/title=(["'])(.*?)\1/,G=/{([\d,-]+)}/,q=["js","jsBlock","jsx","python","html"],Z={js:{start:"\\/\\/",end:""},jsBlock:{start:"\\/\\*",end:"\\*\\/"},jsx:{start:"\\{\\s*\\/\\*",end:"\\*\\/\\s*\\}"},python:{start:"#",end:""},html:{start:"\x3c!--",end:"--\x3e"}},H=["highlight-next-line","highlight-start","highlight-end"],V=function(e){void 0===e&&(e=q);const n=e.map((e=>{const{start:n,end:t}=Z[e];return`(?:${n}\\s*(${H.join("|")})\\s*${t})`})).join("|");return new RegExp(`^\\s*(?:${n})\\s*$`)},W=e=>{switch(e){case"js":case"javascript":case"ts":case"typescript":return V(["js","jsBlock"]);case"jsx":case"tsx":return V(["js","jsBlock","jsx"]);case"html":return V(["js","jsBlock","html"]);case"python":case"py":return V(["python"]);default:return V()}};function K(e){var n,t;return null!==(t=null===(n=null==e?void 0:e.match($))||void 0===n?void 0:n[2])&&void 0!==t?t:""}function Y(e){const n=null==e?void 0:e.split(" ").find((e=>e.startsWith("language-")));return null==n?void 0:n.replace(/language-/,"")}function Q(e,n,t){let o=e.replace(/\n$/,"");if(n&&G.test(n)){const e=n.match(G)[1];return{highlightLines:z()(e).filter((e=>e>0)).map((e=>e-1)),code:o}}if(void 0===t)return{highlightLines:[],code:o};const a=W(t),r=o.split("\n");let i,s="";for(let l=0;lvoid 0!==e))){case"highlight-next-line":s+=`${l},`;break;case"highlight-start":i=l;break;case"highlight-end":s+=`${i}-${l-1},`}r.splice(l,1)}else l+=1}const c=z()(s);return o=r.join("\n"),{highlightLines:c,code:o}}const X=e=>{const{siteConfig:n}=(0,o.Z)(),{title:t,titleDelimiter:a}=n;return e&&e.trim().length?`${e.trim()} ${a} ${t}`:t},J=["zero","one","two","few","many","other"];function ee(e){return J.filter((n=>e.includes(n)))}const ne={locale:"en",pluralForms:ee(["one","other"]),select:e=>1===e?"one":"other"};function te(){const{i18n:{currentLocale:e}}=(0,o.Z)();return(0,f.useMemo)((()=>{if(!Intl.PluralRules)return console.error("Intl.PluralRules not available!\nDocusaurus will fallback to a default/fallback (English) Intl.PluralRules implementation.\n "),ne;try{return function(e){const n=new Intl.PluralRules(e);return{locale:e,pluralForms:ee(n.resolvedOptions().pluralCategories),select:e=>n.select(e)}}(e)}catch(n){return console.error(`Failed to use Intl.PluralRules for locale "${e}".\nDocusaurus will fallback to a default/fallback (English) Intl.PluralRules implementation.\n`),ne}}),[e])}function oe(){const e=te();return{selectMessage:(n,t)=>function(e,n,t){const o=e.split("|");if(1===o.length)return o[0];{o.length>t.pluralForms.length&&console.error(`For locale=${t.locale}, a maximum of ${t.pluralForms.length} plural forms are expected (${t.pluralForms}), but the message contains ${o.length} plural forms: ${e} `);const a=t.select(n),r=t.pluralForms.indexOf(a);return o[Math.min(r,o.length-1)]}}(t,n,e)}}const ae="undefined"!=typeof window?f.useLayoutEffect:f.useEffect;function re(e){const n=(0,f.useRef)(e);return ae((()=>{n.current=e}),[e]),(0,f.useCallback)((function(){return n.current(...arguments)}),[])}function ie(e){const n=(0,f.useRef)();return ae((()=>{n.current=e})),n.current}function se(e){const n=(0,d.TH)(),t=ie(n),o=re(e);(0,f.useEffect)((()=>{n!==t&&o({location:n,previousLocation:t})}),[o,n,t])}var ce=t(412);const le="ease-in-out";function ue(e){let{initialState:n}=e;const[t,o]=(0,f.useState)(null!=n&&n),a=(0,f.useCallback)((()=>{o((e=>!e))}),[]);return{collapsed:t,setCollapsed:o,toggleCollapsed:a}}const de={display:"none",overflow:"hidden",height:"0px"},me={display:"block",overflow:"visible",height:"auto"};function pe(e,n){const t=n?de:me;e.style.display=t.display,e.style.overflow=t.overflow,e.style.height=t.height}function fe(e){let{collapsibleRef:n,collapsed:t,animation:o}=e;const a=(0,f.useRef)(!1);(0,f.useEffect)((()=>{const e=n.current;function r(){var n,t;const a=e.scrollHeight,r=null!==(n=null==o?void 0:o.duration)&&void 0!==n?n:function(e){const n=e/36;return Math.round(10*(4+15*n**.25+n/5))}(a);return{transition:`height ${r}ms ${null!==(t=null==o?void 0:o.easing)&&void 0!==t?t:le}`,height:`${a}px`}}function i(){const n=r();e.style.transition=n.transition,e.style.height=n.height}if(!a.current)return pe(e,t),void(a.current=!0);return e.style.willChange="height",function(){const n=requestAnimationFrame((()=>{t?(i(),requestAnimationFrame((()=>{e.style.height=de.height,e.style.overflow=de.overflow}))):(e.style.display="block",requestAnimationFrame((()=>{i()})))}));return()=>cancelAnimationFrame(n)}()}),[n,t,o])}function he(e){if(!ce.Z.canUseDOM)return e?de:me}function ge(e){let{as:n="div",collapsed:t,children:o,animation:a,onCollapseTransitionEnd:r,className:i,disableSSRStyle:s}=e;const c=(0,f.useRef)(null);return fe({collapsibleRef:c,collapsed:t,animation:a}),f.createElement(n,{ref:c,style:s?void 0:he(t),onTransitionEnd:e=>{"height"===e.propertyName&&(pe(c.current,t),null==r||r(t))},className:i},o)}function ve(e){let{collapsed:n,...t}=e;const[o,a]=(0,f.useState)(!n);(0,f.useLayoutEffect)((()=>{n||a(!0)}),[n]);const[r,i]=(0,f.useState)(n);return(0,f.useLayoutEffect)((()=>{o&&i(n)}),[o,n]),o?f.createElement(ge,{...t,collapsed:r}):null}function be(e){let{lazy:n,...t}=e;const o=n?ve:ge;return f.createElement(o,{...t})}var ye=t(2389),we=t(6010);const Ce="details_Q743",Ae="isBrowser_rWTL",ke="collapsibleContent_K5uX";function De(e){return!!e&&("SUMMARY"===e.tagName||De(e.parentElement))}function Ee(e,n){return!!e&&(e===n||Ee(e.parentElement,n))}const Ie=function(e){let{summary:n,children:t,...o}=e;const a=(0,ye.Z)(),r=(0,f.useRef)(null),{collapsed:i,setCollapsed:s}=ue({initialState:!o.open}),[c,l]=(0,f.useState)(o.open);return f.createElement("details",{...o,ref:r,open:c,"data-collapsed":i,className:(0,we.Z)(Ce,{[Ae]:a},o.className),onMouseDown:e=>{De(e.target)&&e.detail>1&&e.preventDefault()},onClick:e=>{e.stopPropagation();const n=e.target;De(n)&&Ee(n,r.current)&&(e.preventDefault(),i?(s(!1),l(!0)):s(!0))}},n,f.createElement(be,{lazy:!1,collapsed:i,disableSSRStyle:!0,onCollapseTransitionEnd:e=>{s(e),l(!e)}},f.createElement("div",{className:ke},t)))};const Se=(0,f.createContext)(null);function xe(e){let{children:n}=e;return f.createElement(Se.Provider,{value:(0,f.useState)(null)},n)}function _e(){const e=(0,f.useContext)(Se);if(null===e)throw new Error("MobileSecondaryMenuProvider was not used correctly, context value is null");return e}function je(){const[e]=_e();if(e){const n=e.component;return function(t){return f.createElement(n,{...e.props,...t})}}return()=>{}}function Te(e){let{component:n,props:t}=e;const[,o]=_e(),a=(r=t,(0,f.useMemo)((()=>r),[...Object.keys(r),...Object.values(r)]));var r;return(0,f.useEffect)((()=>{o({component:n,props:a})}),[o,n,a]),(0,f.useEffect)((()=>()=>o(null)),[o]),null}function Oe(e){return Array.from(new Set(e))}const Pe={page:{blogListPage:"blog-list-page",blogPostPage:"blog-post-page",blogTagsListPage:"blog-tags-list-page",blogTagPostListPage:"blog-tags-post-list-page",docsDocPage:"docs-doc-page",docsTagsListPage:"docs-tags-list-page",docsTagDocListPage:"docs-tags-doc-list-page",mdxPage:"mdx-page"},wrapper:{main:"main-wrapper",blogPages:"blog-wrapper",docsPages:"docs-wrapper",mdxPages:"mdx-wrapper"},common:{editThisPage:"theme-edit-this-page",lastUpdated:"theme-last-updated",backToTopButton:"theme-back-to-top-button"},layout:{},docs:{docVersionBanner:"theme-doc-version-banner",docVersionBadge:"theme-doc-version-badge",docMarkdown:"theme-doc-markdown",docTocMobile:"theme-doc-toc-mobile",docTocDesktop:"theme-doc-toc-desktop",docFooter:"theme-doc-footer",docFooterTagsRow:"theme-doc-footer-tags-row",docFooterEditMetaRow:"theme-doc-footer-edit-meta-row",docSidebarMenu:"theme-doc-sidebar-menu",docSidebarItemCategory:"theme-doc-sidebar-item-category",docSidebarItemLink:"theme-doc-sidebar-item-link",docSidebarItemCategoryLevel:e=>`theme-doc-sidebar-item-category-level-${e}`,docSidebarItemLinkLevel:e=>`theme-doc-sidebar-item-link-level-${e}`},blog:{}},Le=l("docusaurus.announcement.dismiss"),Re=l("docusaurus.announcement.id"),Ne=()=>"true"===Le.get(),Me=e=>Le.set(String(e)),Fe=()=>{const{announcementBar:e}=a(),n=(0,ye.Z)(),[t,o]=(0,f.useState)((()=>!!n&&Ne()));(0,f.useEffect)((()=>{o(Ne())}),[]);const r=(0,f.useCallback)((()=>{Me(!0),o(!0)}),[]);return(0,f.useEffect)((()=>{if(!e)return;const{id:n}=e;let t=Re.get();"annoucement-bar"===t&&(t="announcement-bar");const a=n!==t;Re.set(n),a&&Me(!1),!a&&Ne()||o(!1)}),[e]),(0,f.useMemo)((()=>({isActive:!!e&&!t,close:r})),[e,t,r])},Be=(0,f.createContext)(null);function Ue(e){let{children:n}=e;const t=Fe();return f.createElement(Be.Provider,{value:t},n)}const ze=()=>{const e=(0,f.useContext)(Be);if(!e)throw new Error("useAnnouncementBar(): AnnouncementBar not found in React context: make sure to use the AnnouncementBarProvider on top of the tree");return e};function $e(){const{siteConfig:{baseUrl:e}}=(0,o.Z)(),{pathname:n}=(0,d.TH)();return n.replace(e,"/")}t(5999);function Ge(e){!function(e){const{block:n}=(0,d.k6)(),t=(0,f.useRef)(e);(0,f.useEffect)((()=>{t.current=e}),[e]),(0,f.useEffect)((()=>n(((e,n)=>t.current(e,n)))),[n,t])}(((n,t)=>{if("POP"===t)return e(n,t)}))}function qe(e){const n=e.getBoundingClientRect();return n.top===n.bottom?qe(e.parentNode):n}function Ze(e,n){let{anchorTopOffset:t}=n;var o;const a=e.find((e=>qe(e).top>=t));if(a){return function(e){return e.top>0&&e.bottom{e.current=n?0:document.querySelector(".navbar").clientHeight}),[n]),e}const Ve=function(e){const n=(0,f.useRef)(void 0),t=He();(0,f.useEffect)((()=>{if(!e)return()=>{};const{linkClassName:o,linkActiveClassName:a,minHeadingLevel:r,maxHeadingLevel:i}=e;function s(){const e=function(e){return Array.from(document.getElementsByClassName(e))}(o),s=function(e){let{minHeadingLevel:n,maxHeadingLevel:t}=e;const o=[];for(let a=n;a<=t;a+=1)o.push(`h${a}.anchor`);return Array.from(document.querySelectorAll(o.join()))}({minHeadingLevel:r,maxHeadingLevel:i}),c=Ze(s,{anchorTopOffset:t.current}),l=e.find((e=>c&&c.id===function(e){return decodeURIComponent(e.href.substring(e.href.indexOf("#")+1))}(e)));e.forEach((e=>{!function(e,t){var o;t?(n.current&&n.current!==e&&(null===(o=n.current)||void 0===o||o.classList.remove(a)),e.classList.add(a),n.current=e):e.classList.remove(a)}(e,e===l)}))}return document.addEventListener("scroll",s),document.addEventListener("resize",s),s(),()=>{document.removeEventListener("scroll",s),document.removeEventListener("resize",s)}}),[e,t])};function We(e){let{toc:n,minHeadingLevel:t,maxHeadingLevel:o}=e;return n.flatMap((e=>{const n=We({toc:e.children,minHeadingLevel:t,maxHeadingLevel:o});return function(e){return e.level>=t&&e.level<=o}(e)?[{...e,children:n}]:n}))}function Ke(e){let{toc:n,minHeadingLevel:t,maxHeadingLevel:o}=e;return(0,f.useMemo)((()=>We({toc:n,minHeadingLevel:t,maxHeadingLevel:o})),[n,t,o])}function Ye(){const e=(0,f.useRef)(!0);return(0,f.useMemo)((()=>({scrollEventsEnabledRef:e,enableScrollEvents:()=>{e.current=!0},disableScrollEvents:()=>{e.current=!1}})),[])}const Qe=(0,f.createContext)(void 0);function Xe(e){let{children:n}=e;return f.createElement(Qe.Provider,{value:Ye()},n)}function Je(){const e=(0,f.useContext)(Qe);if(null==e)throw new Error('"useScrollController" is used but no context provider was found in the React tree.');return e}const en=()=>ce.Z.canUseDOM?{scrollX:window.pageXOffset,scrollY:window.pageYOffset}:null;function nn(e,n){void 0===n&&(n=[]);const{scrollEventsEnabledRef:t}=Je(),o=(0,f.useRef)(en()),a=re(e);(0,f.useEffect)((()=>{const e=()=>{if(!t.current)return;const e=en();a&&a(e,o.current),o.current=e},n={passive:!0};return e(),window.addEventListener("scroll",e,n),()=>window.removeEventListener("scroll",e,n)}),[a,t,...n])}function tn(e,n){return void 0!==e&&void 0!==n&&new RegExp(e,"gi").test(n)}},9166:(e,n,t)=>{"use strict";t.d(n,{Z:()=>x});var o=t(7462),a=t(7294),r=t(3935),i=t(2263),s=t(6550),c=t(4996),l=t(9960),u=t(2859),d=t(9565),m=t(3810);function p(){return a.createElement("svg",{width:"15",height:"15",className:"DocSearch-Control-Key-Icon"},a.createElement("path",{d:"M4.505 4.496h2M5.505 5.496v5M8.216 4.496l.055 5.993M10 7.5c.333.333.5.667.5 1v2M12.326 4.5v5.996M8.384 4.496c1.674 0 2.116 0 2.116 1.5s-.442 1.5-2.116 1.5M3.205 9.303c-.09.448-.277 1.21-1.241 1.203C1 10.5.5 9.513.5 8V7c0-1.57.5-2.5 1.464-2.494.964.006 1.134.598 1.24 1.342M12.553 10.5h1.953",strokeWidth:"1.2",stroke:"currentColor",fill:"none",strokeLinecap:"square"}))}var f=t(830),h=["translations"];function g(){return g=Object.assign||function(e){for(var n=1;ne.length)&&(n=e.length);for(var t=0,o=new Array(n);t=0||(a[t]=e[t]);return a}(e,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(o=0;o=0||Object.prototype.propertyIsEnumerable.call(e,t)&&(a[t]=e[t])}return a}var w="Ctrl";var C=a.forwardRef((function(e,n){var t=e.translations,o=void 0===t?{}:t,r=y(e,h),i=o.buttonText,s=void 0===i?"Search":i,c=o.buttonAriaLabel,l=void 0===c?"Search":c,u=v((0,a.useState)(null),2),d=u[0],m=u[1];return(0,a.useEffect)((function(){"undefined"!=typeof navigator&&(/(Mac|iPhone|iPod|iPad)/i.test(navigator.platform)?m("\u2318"):m(w))}),[]),a.createElement("button",g({type:"button",className:"DocSearch DocSearch-Button","aria-label":l},r,{ref:n}),a.createElement("span",{className:"DocSearch-Button-Container"},a.createElement(f.W,null),a.createElement("span",{className:"DocSearch-Button-Placeholder"},s)),a.createElement("span",{className:"DocSearch-Button-Keys"},null!==d&&a.createElement(a.Fragment,null,a.createElement("kbd",{className:"DocSearch-Button-Key"},d===w?a.createElement(p,null):d),a.createElement("kbd",{className:"DocSearch-Button-Key"},"K"))))}));var A=t(5999);const k={searchBox:"searchBox_Utm0"};let D=null;function E(e){let{hit:n,children:t}=e;return a.createElement(l.Z,{to:n.url},t)}function I(e){let{state:n,onClose:t}=e;const{generateSearchPageLink:o}=(0,d.Z)();return a.createElement(l.Z,{to:o(n.query),onClick:t},"See all ",n.context.nbHits," results")}function S(e){let{contextualSearch:n,externalUrlRegex:l,...d}=e;var p,f;const{siteMetadata:h}=(0,i.Z)(),g=function(){const{locale:e,tags:n}=(0,m._q)();return[`language:${e}`,n.map((e=>`docusaurus_tag:${e}`))]}(),v=null!==(f=null===(p=d.searchParameters)||void 0===p?void 0:p.facetFilters)&&void 0!==f?f:[],b=n?[...g,...v]:v,y={...d.searchParameters,facetFilters:b},{withBaseUrl:w}=(0,c.C)(),S=(0,s.k6)(),x=(0,a.useRef)(null),_=(0,a.useRef)(null),[j,T]=(0,a.useState)(!1),[O,P]=(0,a.useState)(void 0),L=(0,a.useCallback)((()=>D?Promise.resolve():Promise.all([t.e(1426).then(t.bind(t,1426)),Promise.all([t.e(532),t.e(6945)]).then(t.bind(t,6945)),Promise.all([t.e(532),t.e(8894)]).then(t.bind(t,8894))]).then((e=>{let[{DocSearchModal:n}]=e;D=n}))),[]),R=(0,a.useCallback)((()=>{L().then((()=>{x.current=document.createElement("div"),document.body.insertBefore(x.current,document.body.firstChild),T(!0)}))}),[L,T]),N=(0,a.useCallback)((()=>{var e;T(!1),null===(e=x.current)||void 0===e||e.remove()}),[T]),M=(0,a.useCallback)((e=>{L().then((()=>{T(!0),P(e.key)}))}),[L,T,P]),F=(0,a.useRef)({navigate(e){let{itemUrl:n}=e;(0,m.Fx)(l,n)?window.location.href=n:S.push(n)}}).current,B=(0,a.useRef)((e=>e.map((e=>{if((0,m.Fx)(l,e.url))return e;const n=new URL(e.url);return{...e,url:w(`${n.pathname}${n.hash}`)}})))).current,U=(0,a.useMemo)((()=>e=>a.createElement(I,(0,o.Z)({},e,{onClose:N}))),[N]),z=(0,a.useCallback)((e=>(e.addAlgoliaAgent("docusaurus",h.docusaurusVersion),e)),[h.docusaurusVersion]);!function(e){var n=e.isOpen,t=e.onOpen,o=e.onClose,r=e.onInput,i=e.searchButtonRef;a.useEffect((function(){function e(e){var a;(27===e.keyCode&&n||"k"===(null===(a=e.key)||void 0===a?void 0:a.toLowerCase())&&(e.metaKey||e.ctrlKey)||!function(e){var n=e.target,t=n.tagName;return n.isContentEditable||"INPUT"===t||"SELECT"===t||"TEXTAREA"===t}(e)&&"/"===e.key&&!n)&&(e.preventDefault(),n?o():document.body.classList.contains("DocSearch--active")||document.body.classList.contains("DocSearch--active")||t()),i&&i.current===document.activeElement&&r&&/[a-zA-Z0-9]/.test(String.fromCharCode(e.keyCode))&&r(e)}return window.addEventListener("keydown",e),function(){window.removeEventListener("keydown",e)}}),[n,t,o,r,i])}({isOpen:j,onOpen:R,onClose:N,onInput:M,searchButtonRef:_});const $=(0,A.I)({id:"theme.SearchBar.label",message:"Search",description:"The ARIA label and placeholder for search button"});return a.createElement(a.Fragment,null,a.createElement(u.Z,null,a.createElement("link",{rel:"preconnect",href:`https://${d.appId}-dsn.algolia.net`,crossOrigin:"anonymous"})),a.createElement("div",{className:k.searchBox},a.createElement(C,{onTouchStart:L,onFocus:L,onMouseOver:L,onClick:R,ref:_,translations:{buttonText:$,buttonAriaLabel:$}})),j&&D&&x.current&&(0,r.createPortal)(a.createElement(D,(0,o.Z)({onClose:N,initialScrollY:window.scrollY,initialQuery:O,navigator:F,transformItems:B,hitComponent:E,resultsFooterComponent:U,transformSearchClient:z},d,{searchParameters:y})),x.current))}const x=function(){const{siteConfig:e}=(0,i.Z)();return a.createElement(S,e.themeConfig.algolia)}},9565:(e,n,t)=>{"use strict";t.d(n,{Z:()=>i});var o=t(6550),a=t(2263),r=t(7294);const i=function(){const e=(0,o.k6)(),{siteConfig:{baseUrl:n}}=(0,a.Z)(),[t,i]=(0,r.useState)("");return(0,r.useEffect)((()=>{var e;const n=null!==(e=new URLSearchParams(window.location.search).get("q"))&&void 0!==e?e:"";i(n)}),[]),{searchQuery:t,setSearchQuery:(0,r.useCallback)((n=>{const t=new URLSearchParams(window.location.search);n?t.set("q",n):t.delete("q"),e.replace({search:t.toString()}),i(n)}),[e]),generateSearchPageLink:(0,r.useCallback)((e=>`${n}search?q=${encodeURIComponent(e)}`),[n])}}},8802:(e,n)=>{"use strict";Object.defineProperty(n,"__esModule",{value:!0}),n.default=function(e,n){const{trailingSlash:t,baseUrl:o}=n;if(e.startsWith("#"))return e;if(void 0===t)return e;const[a]=e.split(/[#?]/),r="/"===a||a===o?a:(i=a,t?function(e){return e.endsWith("/")?e:`${e}/`}(i):function(e){return e.endsWith("/")?e.slice(0,-1):e}(i));var i;return e.replace(a,r)}},8780:function(e,n,t){"use strict";var o=this&&this.__importDefault||function(e){return e&&e.__esModule?e:{default:e}};Object.defineProperty(n,"__esModule",{value:!0}),n.applyTrailingSlash=void 0;var a=t(8802);Object.defineProperty(n,"applyTrailingSlash",{enumerable:!0,get:function(){return o(a).default}})},6010:(e,n,t)=>{"use strict";function o(e){var n,t,a="";if("string"==typeof e||"number"==typeof e)a+=e;else if("object"==typeof e)if(Array.isArray(e))for(n=0;na});const a=function(){for(var e,n,t=0,a="";t{"use strict";t.d(n,{lX:()=>k,q_:()=>_,ob:()=>h,PP:()=>T,Ep:()=>f,Hp:()=>g});var o=t(7462);function a(e){return"/"===e.charAt(0)}function r(e,n){for(var t=n,o=t+1,a=e.length;o=0;m--){var p=i[m];"."===p?r(i,m):".."===p?(r(i,m),d++):d&&(r(i,m),d--)}if(!l)for(;d--;d)i.unshift("..");!l||""===i[0]||i[0]&&a(i[0])||i.unshift("");var f=i.join("/");return t&&"/"!==f.substr(-1)&&(f+="/"),f};function s(e){return e.valueOf?e.valueOf():Object.prototype.valueOf.call(e)}const c=function e(n,t){if(n===t)return!0;if(null==n||null==t)return!1;if(Array.isArray(n))return Array.isArray(t)&&n.length===t.length&&n.every((function(n,o){return e(n,t[o])}));if("object"==typeof n||"object"==typeof t){var o=s(n),a=s(t);return o!==n||a!==t?e(o,a):Object.keys(Object.assign({},n,t)).every((function(o){return e(n[o],t[o])}))}return!1};var l=t(8776);function u(e){return"/"===e.charAt(0)?e:"/"+e}function d(e){return"/"===e.charAt(0)?e.substr(1):e}function m(e,n){return function(e,n){return 0===e.toLowerCase().indexOf(n.toLowerCase())&&-1!=="/?#".indexOf(e.charAt(n.length))}(e,n)?e.substr(n.length):e}function p(e){return"/"===e.charAt(e.length-1)?e.slice(0,-1):e}function f(e){var n=e.pathname,t=e.search,o=e.hash,a=n||"/";return t&&"?"!==t&&(a+="?"===t.charAt(0)?t:"?"+t),o&&"#"!==o&&(a+="#"===o.charAt(0)?o:"#"+o),a}function h(e,n,t,a){var r;"string"==typeof e?(r=function(e){var n=e||"/",t="",o="",a=n.indexOf("#");-1!==a&&(o=n.substr(a),n=n.substr(0,a));var r=n.indexOf("?");return-1!==r&&(t=n.substr(r),n=n.substr(0,r)),{pathname:n,search:"?"===t?"":t,hash:"#"===o?"":o}}(e),r.state=n):(void 0===(r=(0,o.Z)({},e)).pathname&&(r.pathname=""),r.search?"?"!==r.search.charAt(0)&&(r.search="?"+r.search):r.search="",r.hash?"#"!==r.hash.charAt(0)&&(r.hash="#"+r.hash):r.hash="",void 0!==n&&void 0===r.state&&(r.state=n));try{r.pathname=decodeURI(r.pathname)}catch(s){throw s instanceof URIError?new URIError('Pathname "'+r.pathname+'" could not be decoded. This is likely caused by an invalid percent-encoding.'):s}return t&&(r.key=t),a?r.pathname?"/"!==r.pathname.charAt(0)&&(r.pathname=i(r.pathname,a.pathname)):r.pathname=a.pathname:r.pathname||(r.pathname="/"),r}function g(e,n){return e.pathname===n.pathname&&e.search===n.search&&e.hash===n.hash&&e.key===n.key&&c(e.state,n.state)}function v(){var e=null;var n=[];return{setPrompt:function(n){return e=n,function(){e===n&&(e=null)}},confirmTransitionTo:function(n,t,o,a){if(null!=e){var r="function"==typeof e?e(n,t):e;"string"==typeof r?"function"==typeof o?o(r,a):a(!0):a(!1!==r)}else a(!0)},appendListener:function(e){var t=!0;function o(){t&&e.apply(void 0,arguments)}return n.push(o),function(){t=!1,n=n.filter((function(e){return e!==o}))}},notifyListeners:function(){for(var e=arguments.length,t=new Array(e),o=0;on?t.splice(n,t.length-n,a):t.push(a),d({action:o,location:a,index:n,entries:t})}}))},replace:function(e,n){var o="REPLACE",a=h(e,n,m(),w.location);u.confirmTransitionTo(a,o,t,(function(e){e&&(w.entries[w.index]=a,d({action:o,location:a}))}))},go:y,goBack:function(){y(-1)},goForward:function(){y(1)},canGo:function(e){var n=w.index+e;return n>=0&&n{"use strict";var o=t(9864),a={childContextTypes:!0,contextType:!0,contextTypes:!0,defaultProps:!0,displayName:!0,getDefaultProps:!0,getDerivedStateFromError:!0,getDerivedStateFromProps:!0,mixins:!0,propTypes:!0,type:!0},r={name:!0,length:!0,prototype:!0,caller:!0,callee:!0,arguments:!0,arity:!0},i={$$typeof:!0,compare:!0,defaultProps:!0,displayName:!0,propTypes:!0,type:!0},s={};function c(e){return o.isMemo(e)?i:s[e.$$typeof]||a}s[o.ForwardRef]={$$typeof:!0,render:!0,defaultProps:!0,displayName:!0,propTypes:!0},s[o.Memo]=i;var l=Object.defineProperty,u=Object.getOwnPropertyNames,d=Object.getOwnPropertySymbols,m=Object.getOwnPropertyDescriptor,p=Object.getPrototypeOf,f=Object.prototype;e.exports=function e(n,t,o){if("string"!=typeof t){if(f){var a=p(t);a&&a!==f&&e(n,a,o)}var i=u(t);d&&(i=i.concat(d(t)));for(var s=c(n),h=c(t),g=0;g{e.exports=Array.isArray||function(e){return"[object Array]"==Object.prototype.toString.call(e)}},6743:(e,n,t)=>{"use strict";t.r(n)},2497:(e,n,t)=>{"use strict";t.r(n)},2295:(e,n,t)=>{"use strict";t.r(n)},4865:function(e,n,t){var o,a;o=function(){var e,n,t={version:"0.2.0"},o=t.settings={minimum:.08,easing:"ease",positionUsing:"",speed:200,trickle:!0,trickleRate:.02,trickleSpeed:800,showSpinner:!0,barSelector:'[role="bar"]',spinnerSelector:'[role="spinner"]',parent:"body",template:'
'};function a(e,n,t){return et?t:e}function r(e){return 100*(-1+e)}function i(e,n,t){var a;return(a="translate3d"===o.positionUsing?{transform:"translate3d("+r(e)+"%,0,0)"}:"translate"===o.positionUsing?{transform:"translate("+r(e)+"%,0)"}:{"margin-left":r(e)+"%"}).transition="all "+n+"ms "+t,a}t.configure=function(e){var n,t;for(n in e)void 0!==(t=e[n])&&e.hasOwnProperty(n)&&(o[n]=t);return this},t.status=null,t.set=function(e){var n=t.isStarted();e=a(e,o.minimum,1),t.status=1===e?null:e;var r=t.render(!n),l=r.querySelector(o.barSelector),u=o.speed,d=o.easing;return r.offsetWidth,s((function(n){""===o.positionUsing&&(o.positionUsing=t.getPositioningCSS()),c(l,i(e,u,d)),1===e?(c(r,{transition:"none",opacity:1}),r.offsetWidth,setTimeout((function(){c(r,{transition:"all "+u+"ms linear",opacity:0}),setTimeout((function(){t.remove(),n()}),u)}),u)):setTimeout(n,u)})),this},t.isStarted=function(){return"number"==typeof t.status},t.start=function(){t.status||t.set(0);var e=function(){setTimeout((function(){t.status&&(t.trickle(),e())}),o.trickleSpeed)};return o.trickle&&e(),this},t.done=function(e){return e||t.status?t.inc(.3+.5*Math.random()).set(1):this},t.inc=function(e){var n=t.status;return n?("number"!=typeof e&&(e=(1-n)*a(Math.random()*n,.1,.95)),n=a(n+e,0,.994),t.set(n)):t.start()},t.trickle=function(){return t.inc(Math.random()*o.trickleRate)},e=0,n=0,t.promise=function(o){return o&&"resolved"!==o.state()?(0===n&&t.start(),e++,n++,o.always((function(){0==--n?(e=0,t.done()):t.set((e-n)/e)})),this):this},t.render=function(e){if(t.isRendered())return document.getElementById("nprogress");u(document.documentElement,"nprogress-busy");var n=document.createElement("div");n.id="nprogress",n.innerHTML=o.template;var a,i=n.querySelector(o.barSelector),s=e?"-100":r(t.status||0),l=document.querySelector(o.parent);return c(i,{transition:"all 0 linear",transform:"translate3d("+s+"%,0,0)"}),o.showSpinner||(a=n.querySelector(o.spinnerSelector))&&p(a),l!=document.body&&u(l,"nprogress-custom-parent"),l.appendChild(n),n},t.remove=function(){d(document.documentElement,"nprogress-busy"),d(document.querySelector(o.parent),"nprogress-custom-parent");var e=document.getElementById("nprogress");e&&p(e)},t.isRendered=function(){return!!document.getElementById("nprogress")},t.getPositioningCSS=function(){var e=document.body.style,n="WebkitTransform"in e?"Webkit":"MozTransform"in e?"Moz":"msTransform"in e?"ms":"OTransform"in e?"O":"";return n+"Perspective"in e?"translate3d":n+"Transform"in e?"translate":"margin"};var s=function(){var e=[];function n(){var t=e.shift();t&&t(n)}return function(t){e.push(t),1==e.length&&n()}}(),c=function(){var e=["Webkit","O","Moz","ms"],n={};function t(e){return e.replace(/^-ms-/,"ms-").replace(/-([\da-z])/gi,(function(e,n){return n.toUpperCase()}))}function o(n){var t=document.body.style;if(n in t)return n;for(var o,a=e.length,r=n.charAt(0).toUpperCase()+n.slice(1);a--;)if((o=e[a]+r)in t)return o;return n}function a(e){return e=t(e),n[e]||(n[e]=o(e))}function r(e,n,t){n=a(n),e.style[n]=t}return function(e,n){var t,o,a=arguments;if(2==a.length)for(t in n)void 0!==(o=n[t])&&n.hasOwnProperty(t)&&r(e,t,o);else r(e,a[1],a[2])}}();function l(e,n){return("string"==typeof e?e:m(e)).indexOf(" "+n+" ")>=0}function u(e,n){var t=m(e),o=t+n;l(t,n)||(e.className=o.substring(1))}function d(e,n){var t,o=m(e);l(e,n)&&(t=o.replace(" "+n+" "," "),e.className=t.substring(1,t.length-1))}function m(e){return(" "+(e.className||"")+" ").replace(/\s+/gi," ")}function p(e){e&&e.parentNode&&e.parentNode.removeChild(e)}return t},void 0===(a="function"==typeof o?o.call(n,t,n,e):o)||(e.exports=a)},7418:e=>{"use strict";var n=Object.getOwnPropertySymbols,t=Object.prototype.hasOwnProperty,o=Object.prototype.propertyIsEnumerable;e.exports=function(){try{if(!Object.assign)return!1;var e=new String("abc");if(e[5]="de","5"===Object.getOwnPropertyNames(e)[0])return!1;for(var n={},t=0;t<10;t++)n["_"+String.fromCharCode(t)]=t;if("0123456789"!==Object.getOwnPropertyNames(n).map((function(e){return n[e]})).join(""))return!1;var o={};return"abcdefghijklmnopqrst".split("").forEach((function(e){o[e]=e})),"abcdefghijklmnopqrst"===Object.keys(Object.assign({},o)).join("")}catch(a){return!1}}()?Object.assign:function(e,a){for(var r,i,s=function(e){if(null==e)throw new TypeError("Object.assign cannot be called with null or undefined");return Object(e)}(e),c=1;c{function t(e){let n,t=[];for(let o of e.split(",").map((e=>e.trim())))if(/^-?\d+$/.test(o))t.push(parseInt(o,10));else if(n=o.match(/^(-?\d+)(-|\.\.\.?|\u2025|\u2026|\u22EF)(-?\d+)$/)){let[e,o,a,r]=n;if(o&&r){o=parseInt(o),r=parseInt(r);const e=o{"use strict";t.r(n),t.d(n,{default:()=>r});var o=function(){var e=/(?:^|\s)lang(?:uage)?-([\w-]+)(?=\s|$)/i,n=0,t={},o={util:{encode:function e(n){return n instanceof a?new a(n.type,e(n.content),n.alias):Array.isArray(n)?n.map(e):n.replace(/&/g,"&").replace(/=d.reach);k+=A.value.length,A=A.next){var D=A.value;if(n.length>e.length)return;if(!(D instanceof a)){var E,I=1;if(b){if(!(E=r(C,k,e,v))||E.index>=e.length)break;var S=E.index,x=E.index+E[0].length,_=k;for(_+=A.value.length;S>=_;)_+=(A=A.next).value.length;if(k=_-=A.value.length,A.value instanceof a)continue;for(var j=A;j!==n.tail&&(_d.reach&&(d.reach=L);var R=A.prev;if(O&&(R=c(n,R,O),k+=O.length),l(n,R,I),A=c(n,R,new a(m,g?o.tokenize(T,g):T,y,T)),P&&c(n,A,P),I>1){var N={cause:m+","+f,reach:L};i(e,n,t,A.prev,k,N),d&&N.reach>d.reach&&(d.reach=N.reach)}}}}}}function s(){var e={value:null,prev:null,next:null},n={value:null,prev:e,next:null};e.next=n,this.head=e,this.tail=n,this.length=0}function c(e,n,t){var o=n.next,a={value:t,prev:n,next:o};return n.next=a,o.prev=a,e.length++,a}function l(e,n,t){for(var o=n.next,a=0;a"+r.content+""},o}(),a=o;o.default=o,a.languages.markup={comment:{pattern://,greedy:!0},prolog:{pattern:/<\?[\s\S]+?\?>/,greedy:!0},doctype:{pattern:/"'[\]]|"[^"]*"|'[^']*')+(?:\[(?:[^<"'\]]|"[^"]*"|'[^']*'|<(?!!--)|)*\]\s*)?>/i,greedy:!0,inside:{"internal-subset":{pattern:/(^[^\[]*\[)[\s\S]+(?=\]>$)/,lookbehind:!0,greedy:!0,inside:null},string:{pattern:/"[^"]*"|'[^']*'/,greedy:!0},punctuation:/^$|[[\]]/,"doctype-tag":/^DOCTYPE/i,name:/[^\s<>'"]+/}},cdata:{pattern://i,greedy:!0},tag:{pattern:/<\/?(?!\d)[^\s>\/=$<%]+(?:\s(?:\s*[^\s>\/=]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s'">=]+(?=[\s>]))|(?=[\s/>])))+)?\s*\/?>/,greedy:!0,inside:{tag:{pattern:/^<\/?[^\s>\/]+/,inside:{punctuation:/^<\/?/,namespace:/^[^\s>\/:]+:/}},"special-attr":[],"attr-value":{pattern:/=\s*(?:"[^"]*"|'[^']*'|[^\s'">=]+)/,inside:{punctuation:[{pattern:/^=/,alias:"attr-equals"},/"|'/]}},punctuation:/\/?>/,"attr-name":{pattern:/[^\s>\/]+/,inside:{namespace:/^[^\s>\/:]+:/}}}},entity:[{pattern:/&[\da-z]{1,8};/i,alias:"named-entity"},/&#x?[\da-f]{1,8};/i]},a.languages.markup.tag.inside["attr-value"].inside.entity=a.languages.markup.entity,a.languages.markup.doctype.inside["internal-subset"].inside=a.languages.markup,a.hooks.add("wrap",(function(e){"entity"===e.type&&(e.attributes.title=e.content.replace(/&/,"&"))})),Object.defineProperty(a.languages.markup.tag,"addInlined",{value:function(e,n){var t={};t["language-"+n]={pattern:/(^$)/i,lookbehind:!0,inside:a.languages[n]},t.cdata=/^$/i;var o={"included-cdata":{pattern://i,inside:t}};o["language-"+n]={pattern:/[\s\S]+/,inside:a.languages[n]};var r={};r[e]={pattern:RegExp(/(<__[^>]*>)(?:))*\]\]>|(?!)/.source.replace(/__/g,(function(){return e})),"i"),lookbehind:!0,greedy:!0,inside:o},a.languages.insertBefore("markup","cdata",r)}}),Object.defineProperty(a.languages.markup.tag,"addAttribute",{value:function(e,n){a.languages.markup.tag.inside["special-attr"].push({pattern:RegExp(/(^|["'\s])/.source+"(?:"+e+")"+/\s*=\s*(?:"[^"]*"|'[^']*'|[^\s'">=]+(?=[\s>]))/.source,"i"),lookbehind:!0,inside:{"attr-name":/^[^\s=]+/,"attr-value":{pattern:/=[\s\S]+/,inside:{value:{pattern:/(^=\s*(["']|(?!["'])))\S[\s\S]*(?=\2$)/,lookbehind:!0,alias:[n,"language-"+n],inside:a.languages[n]},punctuation:[{pattern:/^=/,alias:"attr-equals"},/"|'/]}}}})}}),a.languages.html=a.languages.markup,a.languages.mathml=a.languages.markup,a.languages.svg=a.languages.markup,a.languages.xml=a.languages.extend("markup",{}),a.languages.ssml=a.languages.xml,a.languages.atom=a.languages.xml,a.languages.rss=a.languages.xml,function(e){var n="\\b(?:BASH|BASHOPTS|BASH_ALIASES|BASH_ARGC|BASH_ARGV|BASH_CMDS|BASH_COMPLETION_COMPAT_DIR|BASH_LINENO|BASH_REMATCH|BASH_SOURCE|BASH_VERSINFO|BASH_VERSION|COLORTERM|COLUMNS|COMP_WORDBREAKS|DBUS_SESSION_BUS_ADDRESS|DEFAULTS_PATH|DESKTOP_SESSION|DIRSTACK|DISPLAY|EUID|GDMSESSION|GDM_LANG|GNOME_KEYRING_CONTROL|GNOME_KEYRING_PID|GPG_AGENT_INFO|GROUPS|HISTCONTROL|HISTFILE|HISTFILESIZE|HISTSIZE|HOME|HOSTNAME|HOSTTYPE|IFS|INSTANCE|JOB|LANG|LANGUAGE|LC_ADDRESS|LC_ALL|LC_IDENTIFICATION|LC_MEASUREMENT|LC_MONETARY|LC_NAME|LC_NUMERIC|LC_PAPER|LC_TELEPHONE|LC_TIME|LESSCLOSE|LESSOPEN|LINES|LOGNAME|LS_COLORS|MACHTYPE|MAILCHECK|MANDATORY_PATH|NO_AT_BRIDGE|OLDPWD|OPTERR|OPTIND|ORBIT_SOCKETDIR|OSTYPE|PAPERSIZE|PATH|PIPESTATUS|PPID|PS1|PS2|PS3|PS4|PWD|RANDOM|REPLY|SECONDS|SELINUX_INIT|SESSION|SESSIONTYPE|SESSION_MANAGER|SHELL|SHELLOPTS|SHLVL|SSH_AUTH_SOCK|TERM|UID|UPSTART_EVENTS|UPSTART_INSTANCE|UPSTART_JOB|UPSTART_SESSION|USER|WINDOWID|XAUTHORITY|XDG_CONFIG_DIRS|XDG_CURRENT_DESKTOP|XDG_DATA_DIRS|XDG_GREETER_DATA_DIR|XDG_MENU_PREFIX|XDG_RUNTIME_DIR|XDG_SEAT|XDG_SEAT_PATH|XDG_SESSION_DESKTOP|XDG_SESSION_ID|XDG_SESSION_PATH|XDG_SESSION_TYPE|XDG_VTNR|XMODIFIERS)\\b",t={pattern:/(^(["']?)\w+\2)[ \t]+\S.*/,lookbehind:!0,alias:"punctuation",inside:null},o={bash:t,environment:{pattern:RegExp("\\$"+n),alias:"constant"},variable:[{pattern:/\$?\(\([\s\S]+?\)\)/,greedy:!0,inside:{variable:[{pattern:/(^\$\(\([\s\S]+)\)\)/,lookbehind:!0},/^\$\(\(/],number:/\b0x[\dA-Fa-f]+\b|(?:\b\d+(?:\.\d*)?|\B\.\d+)(?:[Ee]-?\d+)?/,operator:/--|\+\+|\*\*=?|<<=?|>>=?|&&|\|\||[=!+\-*/%<>^&|]=?|[?~:]/,punctuation:/\(\(?|\)\)?|,|;/}},{pattern:/\$\((?:\([^)]+\)|[^()])+\)|`[^`]+`/,greedy:!0,inside:{variable:/^\$\(|^`|\)$|`$/}},{pattern:/\$\{[^}]+\}/,greedy:!0,inside:{operator:/:[-=?+]?|[!\/]|##?|%%?|\^\^?|,,?/,punctuation:/[\[\]]/,environment:{pattern:RegExp("(\\{)"+n),lookbehind:!0,alias:"constant"}}},/\$(?:\w+|[#?*!@$])/],entity:/\\(?:[abceEfnrtv\\"]|O?[0-7]{1,3}|U[0-9a-fA-F]{8}|u[0-9a-fA-F]{4}|x[0-9a-fA-F]{1,2})/};e.languages.bash={shebang:{pattern:/^#!\s*\/.*/,alias:"important"},comment:{pattern:/(^|[^"{\\$])#.*/,lookbehind:!0},"function-name":[{pattern:/(\bfunction\s+)[\w-]+(?=(?:\s*\(?:\s*\))?\s*\{)/,lookbehind:!0,alias:"function"},{pattern:/\b[\w-]+(?=\s*\(\s*\)\s*\{)/,alias:"function"}],"for-or-select":{pattern:/(\b(?:for|select)\s+)\w+(?=\s+in\s)/,alias:"variable",lookbehind:!0},"assign-left":{pattern:/(^|[\s;|&]|[<>]\()\w+(?=\+?=)/,inside:{environment:{pattern:RegExp("(^|[\\s;|&]|[<>]\\()"+n),lookbehind:!0,alias:"constant"}},alias:"variable",lookbehind:!0},string:[{pattern:/((?:^|[^<])<<-?\s*)(\w+)\s[\s\S]*?(?:\r?\n|\r)\2/,lookbehind:!0,greedy:!0,inside:o},{pattern:/((?:^|[^<])<<-?\s*)(["'])(\w+)\2\s[\s\S]*?(?:\r?\n|\r)\3/,lookbehind:!0,greedy:!0,inside:{bash:t}},{pattern:/(^|[^\\](?:\\\\)*)"(?:\\[\s\S]|\$\([^)]+\)|\$(?!\()|`[^`]+`|[^"\\`$])*"/,lookbehind:!0,greedy:!0,inside:o},{pattern:/(^|[^$\\])'[^']*'/,lookbehind:!0,greedy:!0},{pattern:/\$'(?:[^'\\]|\\[\s\S])*'/,greedy:!0,inside:{entity:o.entity}}],environment:{pattern:RegExp("\\$?"+n),alias:"constant"},variable:o.variable,function:{pattern:/(^|[\s;|&]|[<>]\()(?:add|apropos|apt|apt-cache|apt-get|aptitude|aspell|automysqlbackup|awk|basename|bash|bc|bconsole|bg|bzip2|cal|cat|cfdisk|chgrp|chkconfig|chmod|chown|chroot|cksum|clear|cmp|column|comm|composer|cp|cron|crontab|csplit|curl|cut|date|dc|dd|ddrescue|debootstrap|df|diff|diff3|dig|dir|dircolors|dirname|dirs|dmesg|docker|docker-compose|du|egrep|eject|env|ethtool|expand|expect|expr|fdformat|fdisk|fg|fgrep|file|find|fmt|fold|format|free|fsck|ftp|fuser|gawk|git|gparted|grep|groupadd|groupdel|groupmod|groups|grub-mkconfig|gzip|halt|head|hg|history|host|hostname|htop|iconv|id|ifconfig|ifdown|ifup|import|install|ip|jobs|join|kill|killall|less|link|ln|locate|logname|logrotate|look|lpc|lpr|lprint|lprintd|lprintq|lprm|ls|lsof|lynx|make|man|mc|mdadm|mkconfig|mkdir|mke2fs|mkfifo|mkfs|mkisofs|mknod|mkswap|mmv|more|most|mount|mtools|mtr|mutt|mv|nano|nc|netstat|nice|nl|node|nohup|notify-send|npm|nslookup|op|open|parted|passwd|paste|pathchk|ping|pkill|pnpm|podman|podman-compose|popd|pr|printcap|printenv|ps|pushd|pv|quota|quotacheck|quotactl|ram|rar|rcp|reboot|remsync|rename|renice|rev|rm|rmdir|rpm|rsync|scp|screen|sdiff|sed|sendmail|seq|service|sftp|sh|shellcheck|shuf|shutdown|sleep|slocate|sort|split|ssh|stat|strace|su|sudo|sum|suspend|swapon|sync|tac|tail|tar|tee|time|timeout|top|touch|tr|traceroute|tsort|tty|umount|uname|unexpand|uniq|units|unrar|unshar|unzip|update-grub|uptime|useradd|userdel|usermod|users|uudecode|uuencode|v|vcpkg|vdir|vi|vim|virsh|vmstat|wait|watch|wc|wget|whereis|which|who|whoami|write|xargs|xdg-open|yarn|yes|zenity|zip|zsh|zypper)(?=$|[)\s;|&])/,lookbehind:!0},keyword:{pattern:/(^|[\s;|&]|[<>]\()(?:case|do|done|elif|else|esac|fi|for|function|if|in|select|then|until|while)(?=$|[)\s;|&])/,lookbehind:!0},builtin:{pattern:/(^|[\s;|&]|[<>]\()(?:\.|:|alias|bind|break|builtin|caller|cd|command|continue|declare|echo|enable|eval|exec|exit|export|getopts|hash|help|let|local|logout|mapfile|printf|pwd|read|readarray|readonly|return|set|shift|shopt|source|test|times|trap|type|typeset|ulimit|umask|unalias|unset)(?=$|[)\s;|&])/,lookbehind:!0,alias:"class-name"},boolean:{pattern:/(^|[\s;|&]|[<>]\()(?:false|true)(?=$|[)\s;|&])/,lookbehind:!0},"file-descriptor":{pattern:/\B&\d\b/,alias:"important"},operator:{pattern:/\d?<>|>\||\+=|=[=~]?|!=?|<<[<-]?|[&\d]?>>|\d[<>]&?|[<>][&=]?|&[>&]?|\|[&|]?/,inside:{"file-descriptor":{pattern:/^\d/,alias:"important"}}},punctuation:/\$?\(\(?|\)\)?|\.\.|[{}[\];\\]/,number:{pattern:/(^|\s)(?:[1-9]\d*|0)(?:[.,]\d+)?\b/,lookbehind:!0}},t.inside=e.languages.bash;for(var a=["comment","function-name","for-or-select","assign-left","string","environment","function","keyword","builtin","boolean","file-descriptor","operator","punctuation","number"],r=o.variable[1].inside,i=0;i]=?|[!=]=?=?|--?|\+\+?|&&?|\|\|?|[?*/~^%]/,punctuation:/[{}[\];(),.:]/},a.languages.c=a.languages.extend("clike",{comment:{pattern:/\/\/(?:[^\r\n\\]|\\(?:\r\n?|\n|(?![\r\n])))*|\/\*[\s\S]*?(?:\*\/|$)/,greedy:!0},string:{pattern:/"(?:\\(?:\r\n|[\s\S])|[^"\\\r\n])*"/,greedy:!0},"class-name":{pattern:/(\b(?:enum|struct)\s+(?:__attribute__\s*\(\([\s\S]*?\)\)\s*)?)\w+|\b[a-z]\w*_t\b/,lookbehind:!0},keyword:/\b(?:_Alignas|_Alignof|_Atomic|_Bool|_Complex|_Generic|_Imaginary|_Noreturn|_Static_assert|_Thread_local|__attribute__|asm|auto|break|case|char|const|continue|default|do|double|else|enum|extern|float|for|goto|if|inline|int|long|register|return|short|signed|sizeof|static|struct|switch|typedef|typeof|union|unsigned|void|volatile|while)\b/,function:/\b[a-z_]\w*(?=\s*\()/i,number:/(?:\b0x(?:[\da-f]+(?:\.[\da-f]*)?|\.[\da-f]+)(?:p[+-]?\d+)?|(?:\b\d+(?:\.\d*)?|\B\.\d+)(?:e[+-]?\d+)?)[ful]{0,4}/i,operator:/>>=?|<<=?|->|([-+&|:])\1|[?:~]|[-+*/%&|^!=<>]=?/}),a.languages.insertBefore("c","string",{char:{pattern:/'(?:\\(?:\r\n|[\s\S])|[^'\\\r\n]){0,32}'/,greedy:!0}}),a.languages.insertBefore("c","string",{macro:{pattern:/(^[\t ]*)#\s*[a-z](?:[^\r\n\\/]|\/(?!\*)|\/\*(?:[^*]|\*(?!\/))*\*\/|\\(?:\r\n|[\s\S]))*/im,lookbehind:!0,greedy:!0,alias:"property",inside:{string:[{pattern:/^(#\s*include\s*)<[^>]+>/,lookbehind:!0},a.languages.c.string],char:a.languages.c.char,comment:a.languages.c.comment,"macro-name":[{pattern:/(^#\s*define\s+)\w+\b(?!\()/i,lookbehind:!0},{pattern:/(^#\s*define\s+)\w+\b(?=\()/i,lookbehind:!0,alias:"function"}],directive:{pattern:/^(#\s*)[a-z]+/,lookbehind:!0,alias:"keyword"},"directive-hash":/^#/,punctuation:/##|\\(?=[\r\n])/,expression:{pattern:/\S[\s\S]*/,inside:a.languages.c}}}}),a.languages.insertBefore("c","function",{constant:/\b(?:EOF|NULL|SEEK_CUR|SEEK_END|SEEK_SET|__DATE__|__FILE__|__LINE__|__TIMESTAMP__|__TIME__|__func__|stderr|stdin|stdout)\b/}),delete a.languages.c.boolean,function(e){var n=/\b(?:alignas|alignof|asm|auto|bool|break|case|catch|char|char16_t|char32_t|char8_t|class|co_await|co_return|co_yield|compl|concept|const|const_cast|consteval|constexpr|constinit|continue|decltype|default|delete|do|double|dynamic_cast|else|enum|explicit|export|extern|final|float|for|friend|goto|if|import|inline|int|int16_t|int32_t|int64_t|int8_t|long|module|mutable|namespace|new|noexcept|nullptr|operator|override|private|protected|public|register|reinterpret_cast|requires|return|short|signed|sizeof|static|static_assert|static_cast|struct|switch|template|this|thread_local|throw|try|typedef|typeid|typename|uint16_t|uint32_t|uint64_t|uint8_t|union|unsigned|using|virtual|void|volatile|wchar_t|while)\b/,t=/\b(?!)\w+(?:\s*\.\s*\w+)*\b/.source.replace(//g,(function(){return n.source}));e.languages.cpp=e.languages.extend("c",{"class-name":[{pattern:RegExp(/(\b(?:class|concept|enum|struct|typename)\s+)(?!)\w+/.source.replace(//g,(function(){return n.source}))),lookbehind:!0},/\b[A-Z]\w*(?=\s*::\s*\w+\s*\()/,/\b[A-Z_]\w*(?=\s*::\s*~\w+\s*\()/i,/\b\w+(?=\s*<(?:[^<>]|<(?:[^<>]|<[^<>]*>)*>)*>\s*::\s*\w+\s*\()/],keyword:n,number:{pattern:/(?:\b0b[01']+|\b0x(?:[\da-f']+(?:\.[\da-f']*)?|\.[\da-f']+)(?:p[+-]?[\d']+)?|(?:\b[\d']+(?:\.[\d']*)?|\B\.[\d']+)(?:e[+-]?[\d']+)?)[ful]{0,4}/i,greedy:!0},operator:/>>=?|<<=?|->|--|\+\+|&&|\|\||[?:~]|<=>|[-+*/%&|^!=<>]=?|\b(?:and|and_eq|bitand|bitor|not|not_eq|or|or_eq|xor|xor_eq)\b/,boolean:/\b(?:false|true)\b/}),e.languages.insertBefore("cpp","string",{module:{pattern:RegExp(/(\b(?:import|module)\s+)/.source+"(?:"+/"(?:\\(?:\r\n|[\s\S])|[^"\\\r\n])*"|<[^<>\r\n]*>/.source+"|"+/(?:\s*:\s*)?|:\s*/.source.replace(//g,(function(){return t}))+")"),lookbehind:!0,greedy:!0,inside:{string:/^[<"][\s\S]+/,operator:/:/,punctuation:/\./}},"raw-string":{pattern:/R"([^()\\ ]{0,16})\([\s\S]*?\)\1"/,alias:"string",greedy:!0}}),e.languages.insertBefore("cpp","keyword",{"generic-function":{pattern:/\b(?!operator\b)[a-z_]\w*\s*<(?:[^<>]|<[^<>]*>)*>(?=\s*\()/i,inside:{function:/^\w+/,generic:{pattern:/<[\s\S]+/,alias:"class-name",inside:e.languages.cpp}}}}),e.languages.insertBefore("cpp","operator",{"double-colon":{pattern:/::/,alias:"punctuation"}}),e.languages.insertBefore("cpp","class-name",{"base-clause":{pattern:/(\b(?:class|struct)\s+\w+\s*:\s*)[^;{}"'\s]+(?:\s+[^;{}"'\s]+)*(?=\s*[;{])/,lookbehind:!0,greedy:!0,inside:e.languages.extend("cpp",{})}}),e.languages.insertBefore("inside","double-colon",{"class-name":/\b[a-z_]\w*\b(?!\s*::)/i},e.languages.cpp["base-clause"])}(a),function(e){var n=/(?:"(?:\\(?:\r\n|[\s\S])|[^"\\\r\n])*"|'(?:\\(?:\r\n|[\s\S])|[^'\\\r\n])*')/;e.languages.css={comment:/\/\*[\s\S]*?\*\//,atrule:{pattern:/@[\w-](?:[^;{\s]|\s+(?![\s{]))*(?:;|(?=\s*\{))/,inside:{rule:/^@[\w-]+/,"selector-function-argument":{pattern:/(\bselector\s*\(\s*(?![\s)]))(?:[^()\s]|\s+(?![\s)])|\((?:[^()]|\([^()]*\))*\))+(?=\s*\))/,lookbehind:!0,alias:"selector"},keyword:{pattern:/(^|[^\w-])(?:and|not|only|or)(?![\w-])/,lookbehind:!0}}},url:{pattern:RegExp("\\burl\\((?:"+n.source+"|"+/(?:[^\\\r\n()"']|\\[\s\S])*/.source+")\\)","i"),greedy:!0,inside:{function:/^url/i,punctuation:/^\(|\)$/,string:{pattern:RegExp("^"+n.source+"$"),alias:"url"}}},selector:{pattern:RegExp("(^|[{}\\s])[^{}\\s](?:[^{};\"'\\s]|\\s+(?![\\s{])|"+n.source+")*(?=\\s*\\{)"),lookbehind:!0},string:{pattern:n,greedy:!0},property:{pattern:/(^|[^-\w\xA0-\uFFFF])(?!\s)[-_a-z\xA0-\uFFFF](?:(?!\s)[-\w\xA0-\uFFFF])*(?=\s*:)/i,lookbehind:!0},important:/!important\b/i,function:{pattern:/(^|[^-a-z0-9])[-a-z0-9]+(?=\()/i,lookbehind:!0},punctuation:/[(){};:,]/},e.languages.css.atrule.inside.rest=e.languages.css;var t=e.languages.markup;t&&(t.tag.addInlined("style","css"),t.tag.addAttribute("style","css"))}(a),function(e){var n,t=/("|')(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/;e.languages.css.selector={pattern:e.languages.css.selector.pattern,lookbehind:!0,inside:n={"pseudo-element":/:(?:after|before|first-letter|first-line|selection)|::[-\w]+/,"pseudo-class":/:[-\w]+/,class:/\.[-\w]+/,id:/#[-\w]+/,attribute:{pattern:RegExp("\\[(?:[^[\\]\"']|"+t.source+")*\\]"),greedy:!0,inside:{punctuation:/^\[|\]$/,"case-sensitivity":{pattern:/(\s)[si]$/i,lookbehind:!0,alias:"keyword"},namespace:{pattern:/^(\s*)(?:(?!\s)[-*\w\xA0-\uFFFF])*\|(?!=)/,lookbehind:!0,inside:{punctuation:/\|$/}},"attr-name":{pattern:/^(\s*)(?:(?!\s)[-\w\xA0-\uFFFF])+/,lookbehind:!0},"attr-value":[t,{pattern:/(=\s*)(?:(?!\s)[-\w\xA0-\uFFFF])+(?=\s*$)/,lookbehind:!0}],operator:/[|~*^$]?=/}},"n-th":[{pattern:/(\(\s*)[+-]?\d*[\dn](?:\s*[+-]\s*\d+)?(?=\s*\))/,lookbehind:!0,inside:{number:/[\dn]+/,operator:/[+-]/}},{pattern:/(\(\s*)(?:even|odd)(?=\s*\))/i,lookbehind:!0}],combinator:/>|\+|~|\|\|/,punctuation:/[(),]/}},e.languages.css.atrule.inside["selector-function-argument"].inside=n,e.languages.insertBefore("css","property",{variable:{pattern:/(^|[^-\w\xA0-\uFFFF])--(?!\s)[-_a-z\xA0-\uFFFF](?:(?!\s)[-\w\xA0-\uFFFF])*/i,lookbehind:!0}});var o={pattern:/(\b\d+)(?:%|[a-z]+(?![\w-]))/,lookbehind:!0},a={pattern:/(^|[^\w.-])-?(?:\d+(?:\.\d+)?|\.\d+)/,lookbehind:!0};e.languages.insertBefore("css","function",{operator:{pattern:/(\s)[+\-*\/](?=\s)/,lookbehind:!0},hexcode:{pattern:/\B#[\da-f]{3,8}\b/i,alias:"color"},color:[{pattern:/(^|[^\w-])(?:AliceBlue|AntiqueWhite|Aqua|Aquamarine|Azure|Beige|Bisque|Black|BlanchedAlmond|Blue|BlueViolet|Brown|BurlyWood|CadetBlue|Chartreuse|Chocolate|Coral|CornflowerBlue|Cornsilk|Crimson|Cyan|DarkBlue|DarkCyan|DarkGoldenRod|DarkGr[ae]y|DarkGreen|DarkKhaki|DarkMagenta|DarkOliveGreen|DarkOrange|DarkOrchid|DarkRed|DarkSalmon|DarkSeaGreen|DarkSlateBlue|DarkSlateGr[ae]y|DarkTurquoise|DarkViolet|DeepPink|DeepSkyBlue|DimGr[ae]y|DodgerBlue|FireBrick|FloralWhite|ForestGreen|Fuchsia|Gainsboro|GhostWhite|Gold|GoldenRod|Gr[ae]y|Green|GreenYellow|HoneyDew|HotPink|IndianRed|Indigo|Ivory|Khaki|Lavender|LavenderBlush|LawnGreen|LemonChiffon|LightBlue|LightCoral|LightCyan|LightGoldenRodYellow|LightGr[ae]y|LightGreen|LightPink|LightSalmon|LightSeaGreen|LightSkyBlue|LightSlateGr[ae]y|LightSteelBlue|LightYellow|Lime|LimeGreen|Linen|Magenta|Maroon|MediumAquaMarine|MediumBlue|MediumOrchid|MediumPurple|MediumSeaGreen|MediumSlateBlue|MediumSpringGreen|MediumTurquoise|MediumVioletRed|MidnightBlue|MintCream|MistyRose|Moccasin|NavajoWhite|Navy|OldLace|Olive|OliveDrab|Orange|OrangeRed|Orchid|PaleGoldenRod|PaleGreen|PaleTurquoise|PaleVioletRed|PapayaWhip|PeachPuff|Peru|Pink|Plum|PowderBlue|Purple|Red|RosyBrown|RoyalBlue|SaddleBrown|Salmon|SandyBrown|SeaGreen|SeaShell|Sienna|Silver|SkyBlue|SlateBlue|SlateGr[ae]y|Snow|SpringGreen|SteelBlue|Tan|Teal|Thistle|Tomato|Transparent|Turquoise|Violet|Wheat|White|WhiteSmoke|Yellow|YellowGreen)(?![\w-])/i,lookbehind:!0},{pattern:/\b(?:hsl|rgb)\(\s*\d{1,3}\s*,\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*\)\B|\b(?:hsl|rgb)a\(\s*\d{1,3}\s*,\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*,\s*(?:0|0?\.\d+|1)\s*\)\B/i,inside:{unit:o,number:a,function:/[\w-]+(?=\()/,punctuation:/[(),]/}}],entity:/\\[\da-f]{1,8}/i,unit:o,number:a})}(a),a.languages.javascript=a.languages.extend("clike",{"class-name":[a.languages.clike["class-name"],{pattern:/(^|[^$\w\xA0-\uFFFF])(?!\s)[_$A-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*(?=\.(?:constructor|prototype))/,lookbehind:!0}],keyword:[{pattern:/((?:^|\})\s*)catch\b/,lookbehind:!0},{pattern:/(^|[^.]|\.\.\.\s*)\b(?:as|assert(?=\s*\{)|async(?=\s*(?:function\b|\(|[$\w\xA0-\uFFFF]|$))|await|break|case|class|const|continue|debugger|default|delete|do|else|enum|export|extends|finally(?=\s*(?:\{|$))|for|from(?=\s*(?:['"]|$))|function|(?:get|set)(?=\s*(?:[#\[$\w\xA0-\uFFFF]|$))|if|implements|import|in|instanceof|interface|let|new|null|of|package|private|protected|public|return|static|super|switch|this|throw|try|typeof|undefined|var|void|while|with|yield)\b/,lookbehind:!0}],function:/#?(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*(?=\s*(?:\.\s*(?:apply|bind|call)\s*)?\()/,number:{pattern:RegExp(/(^|[^\w$])/.source+"(?:"+/NaN|Infinity/.source+"|"+/0[bB][01]+(?:_[01]+)*n?/.source+"|"+/0[oO][0-7]+(?:_[0-7]+)*n?/.source+"|"+/0[xX][\dA-Fa-f]+(?:_[\dA-Fa-f]+)*n?/.source+"|"+/\d+(?:_\d+)*n/.source+"|"+/(?:\d+(?:_\d+)*(?:\.(?:\d+(?:_\d+)*)?)?|\.\d+(?:_\d+)*)(?:[Ee][+-]?\d+(?:_\d+)*)?/.source+")"+/(?![\w$])/.source),lookbehind:!0},operator:/--|\+\+|\*\*=?|=>|&&=?|\|\|=?|[!=]==|<<=?|>>>?=?|[-+*/%&|^!=<>]=?|\.{3}|\?\?=?|\?\.?|[~:]/}),a.languages.javascript["class-name"][0].pattern=/(\b(?:class|extends|implements|instanceof|interface|new)\s+)[\w.\\]+/,a.languages.insertBefore("javascript","keyword",{regex:{pattern:/((?:^|[^$\w\xA0-\uFFFF."'\])\s]|\b(?:return|yield))\s*)\/(?:\[(?:[^\]\\\r\n]|\\.)*\]|\\.|[^/\\\[\r\n])+\/[dgimyus]{0,7}(?=(?:\s|\/\*(?:[^*]|\*(?!\/))*\*\/)*(?:$|[\r\n,.;:})\]]|\/\/))/,lookbehind:!0,greedy:!0,inside:{"regex-source":{pattern:/^(\/)[\s\S]+(?=\/[a-z]*$)/,lookbehind:!0,alias:"language-regex",inside:a.languages.regex},"regex-delimiter":/^\/|\/$/,"regex-flags":/^[a-z]+$/}},"function-variable":{pattern:/#?(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*(?=\s*[=:]\s*(?:async\s*)?(?:\bfunction\b|(?:\((?:[^()]|\([^()]*\))*\)|(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*)\s*=>))/,alias:"function"},parameter:[{pattern:/(function(?:\s+(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*)?\s*\(\s*)(?!\s)(?:[^()\s]|\s+(?![\s)])|\([^()]*\))+(?=\s*\))/,lookbehind:!0,inside:a.languages.javascript},{pattern:/(^|[^$\w\xA0-\uFFFF])(?!\s)[_$a-z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*(?=\s*=>)/i,lookbehind:!0,inside:a.languages.javascript},{pattern:/(\(\s*)(?!\s)(?:[^()\s]|\s+(?![\s)])|\([^()]*\))+(?=\s*\)\s*=>)/,lookbehind:!0,inside:a.languages.javascript},{pattern:/((?:\b|\s|^)(?!(?:as|async|await|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|finally|for|from|function|get|if|implements|import|in|instanceof|interface|let|new|null|of|package|private|protected|public|return|set|static|super|switch|this|throw|try|typeof|undefined|var|void|while|with|yield)(?![$\w\xA0-\uFFFF]))(?:(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*\s*)\(\s*|\]\s*\(\s*)(?!\s)(?:[^()\s]|\s+(?![\s)])|\([^()]*\))+(?=\s*\)\s*\{)/,lookbehind:!0,inside:a.languages.javascript}],constant:/\b[A-Z](?:[A-Z_]|\dx?)*\b/}),a.languages.insertBefore("javascript","string",{hashbang:{pattern:/^#!.*/,greedy:!0,alias:"comment"},"template-string":{pattern:/`(?:\\[\s\S]|\$\{(?:[^{}]|\{(?:[^{}]|\{[^}]*\})*\})+\}|(?!\$\{)[^\\`])*`/,greedy:!0,inside:{"template-punctuation":{pattern:/^`|`$/,alias:"string"},interpolation:{pattern:/((?:^|[^\\])(?:\\{2})*)\$\{(?:[^{}]|\{(?:[^{}]|\{[^}]*\})*\})+\}/,lookbehind:!0,inside:{"interpolation-punctuation":{pattern:/^\$\{|\}$/,alias:"punctuation"},rest:a.languages.javascript}},string:/[\s\S]+/}},"string-property":{pattern:/((?:^|[,{])[ \t]*)(["'])(?:\\(?:\r\n|[\s\S])|(?!\2)[^\\\r\n])*\2(?=\s*:)/m,lookbehind:!0,greedy:!0,alias:"property"}}),a.languages.insertBefore("javascript","operator",{"literal-property":{pattern:/((?:^|[,{])[ \t]*)(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*(?=\s*:)/m,lookbehind:!0,alias:"property"}}),a.languages.markup&&(a.languages.markup.tag.addInlined("script","javascript"),a.languages.markup.tag.addAttribute(/on(?:abort|blur|change|click|composition(?:end|start|update)|dblclick|error|focus(?:in|out)?|key(?:down|up)|load|mouse(?:down|enter|leave|move|out|over|up)|reset|resize|scroll|select|slotchange|submit|unload|wheel)/.source,"javascript")),a.languages.js=a.languages.javascript,function(e){var n=/#(?!\{).+/,t={pattern:/#\{[^}]+\}/,alias:"variable"};e.languages.coffeescript=e.languages.extend("javascript",{comment:n,string:[{pattern:/'(?:\\[\s\S]|[^\\'])*'/,greedy:!0},{pattern:/"(?:\\[\s\S]|[^\\"])*"/,greedy:!0,inside:{interpolation:t}}],keyword:/\b(?:and|break|by|catch|class|continue|debugger|delete|do|each|else|extend|extends|false|finally|for|if|in|instanceof|is|isnt|let|loop|namespace|new|no|not|null|of|off|on|or|own|return|super|switch|then|this|throw|true|try|typeof|undefined|unless|until|when|while|window|with|yes|yield)\b/,"class-member":{pattern:/@(?!\d)\w+/,alias:"variable"}}),e.languages.insertBefore("coffeescript","comment",{"multiline-comment":{pattern:/###[\s\S]+?###/,alias:"comment"},"block-regex":{pattern:/\/{3}[\s\S]*?\/{3}/,alias:"regex",inside:{comment:n,interpolation:t}}}),e.languages.insertBefore("coffeescript","string",{"inline-javascript":{pattern:/`(?:\\[\s\S]|[^\\`])*`/,inside:{delimiter:{pattern:/^`|`$/,alias:"punctuation"},script:{pattern:/[\s\S]+/,alias:"language-javascript",inside:e.languages.javascript}}},"multiline-string":[{pattern:/'''[\s\S]*?'''/,greedy:!0,alias:"string"},{pattern:/"""[\s\S]*?"""/,greedy:!0,alias:"string",inside:{interpolation:t}}]}),e.languages.insertBefore("coffeescript","keyword",{property:/(?!\d)\w+(?=\s*:(?!:))/}),delete e.languages.coffeescript["template-string"],e.languages.coffee=e.languages.coffeescript}(a),function(e){var n=/[*&][^\s[\]{},]+/,t=/!(?:<[\w\-%#;/?:@&=+$,.!~*'()[\]]+>|(?:[a-zA-Z\d-]*!)?[\w\-%#;/?:@&=+$.~*'()]+)?/,o="(?:"+t.source+"(?:[ \t]+"+n.source+")?|"+n.source+"(?:[ \t]+"+t.source+")?)",a=/(?:[^\s\x00-\x08\x0e-\x1f!"#%&'*,\-:>?@[\]`{|}\x7f-\x84\x86-\x9f\ud800-\udfff\ufffe\uffff]|[?:-])(?:[ \t]*(?:(?![#:])|:))*/.source.replace(//g,(function(){return/[^\s\x00-\x08\x0e-\x1f,[\]{}\x7f-\x84\x86-\x9f\ud800-\udfff\ufffe\uffff]/.source})),r=/"(?:[^"\\\r\n]|\\.)*"|'(?:[^'\\\r\n]|\\.)*'/.source;function i(e,n){n=(n||"").replace(/m/g,"")+"m";var t=/([:\-,[{]\s*(?:\s<>[ \t]+)?)(?:<>)(?=[ \t]*(?:$|,|\]|\}|(?:[\r\n]\s*)?#))/.source.replace(/<>/g,(function(){return o})).replace(/<>/g,(function(){return e}));return RegExp(t,n)}e.languages.yaml={scalar:{pattern:RegExp(/([\-:]\s*(?:\s<>[ \t]+)?[|>])[ \t]*(?:((?:\r?\n|\r)[ \t]+)\S[^\r\n]*(?:\2[^\r\n]+)*)/.source.replace(/<>/g,(function(){return o}))),lookbehind:!0,alias:"string"},comment:/#.*/,key:{pattern:RegExp(/((?:^|[:\-,[{\r\n?])[ \t]*(?:<>[ \t]+)?)<>(?=\s*:\s)/.source.replace(/<>/g,(function(){return o})).replace(/<>/g,(function(){return"(?:"+a+"|"+r+")"}))),lookbehind:!0,greedy:!0,alias:"atrule"},directive:{pattern:/(^[ \t]*)%.+/m,lookbehind:!0,alias:"important"},datetime:{pattern:i(/\d{4}-\d\d?-\d\d?(?:[tT]|[ \t]+)\d\d?:\d{2}:\d{2}(?:\.\d*)?(?:[ \t]*(?:Z|[-+]\d\d?(?::\d{2})?))?|\d{4}-\d{2}-\d{2}|\d\d?:\d{2}(?::\d{2}(?:\.\d*)?)?/.source),lookbehind:!0,alias:"number"},boolean:{pattern:i(/false|true/.source,"i"),lookbehind:!0,alias:"important"},null:{pattern:i(/null|~/.source,"i"),lookbehind:!0,alias:"important"},string:{pattern:i(r),lookbehind:!0,greedy:!0},number:{pattern:i(/[+-]?(?:0x[\da-f]+|0o[0-7]+|(?:\d+(?:\.\d*)?|\.\d+)(?:e[+-]?\d+)?|\.inf|\.nan)/.source,"i"),lookbehind:!0},tag:t,important:n,punctuation:/---|[:[\]{}\-,|>?]|\.\.\./},e.languages.yml=e.languages.yaml}(a),function(e){var n=/(?:\\.|[^\\\n\r]|(?:\n|\r\n?)(?![\r\n]))/.source;function t(e){return e=e.replace(//g,(function(){return n})),RegExp(/((?:^|[^\\])(?:\\{2})*)/.source+"(?:"+e+")")}var o=/(?:\\.|``(?:[^`\r\n]|`(?!`))+``|`[^`\r\n]+`|[^\\|\r\n`])+/.source,a=/\|?__(?:\|__)+\|?(?:(?:\n|\r\n?)|(?![\s\S]))/.source.replace(/__/g,(function(){return o})),r=/\|?[ \t]*:?-{3,}:?[ \t]*(?:\|[ \t]*:?-{3,}:?[ \t]*)+\|?(?:\n|\r\n?)/.source;e.languages.markdown=e.languages.extend("markup",{}),e.languages.insertBefore("markdown","prolog",{"front-matter-block":{pattern:/(^(?:\s*[\r\n])?)---(?!.)[\s\S]*?[\r\n]---(?!.)/,lookbehind:!0,greedy:!0,inside:{punctuation:/^---|---$/,"front-matter":{pattern:/\S+(?:\s+\S+)*/,alias:["yaml","language-yaml"],inside:e.languages.yaml}}},blockquote:{pattern:/^>(?:[\t ]*>)*/m,alias:"punctuation"},table:{pattern:RegExp("^"+a+r+"(?:"+a+")*","m"),inside:{"table-data-rows":{pattern:RegExp("^("+a+r+")(?:"+a+")*$"),lookbehind:!0,inside:{"table-data":{pattern:RegExp(o),inside:e.languages.markdown},punctuation:/\|/}},"table-line":{pattern:RegExp("^("+a+")"+r+"$"),lookbehind:!0,inside:{punctuation:/\||:?-{3,}:?/}},"table-header-row":{pattern:RegExp("^"+a+"$"),inside:{"table-header":{pattern:RegExp(o),alias:"important",inside:e.languages.markdown},punctuation:/\|/}}}},code:[{pattern:/((?:^|\n)[ \t]*\n|(?:^|\r\n?)[ \t]*\r\n?)(?: {4}|\t).+(?:(?:\n|\r\n?)(?: {4}|\t).+)*/,lookbehind:!0,alias:"keyword"},{pattern:/^```[\s\S]*?^```$/m,greedy:!0,inside:{"code-block":{pattern:/^(```.*(?:\n|\r\n?))[\s\S]+?(?=(?:\n|\r\n?)^```$)/m,lookbehind:!0},"code-language":{pattern:/^(```).+/,lookbehind:!0},punctuation:/```/}}],title:[{pattern:/\S.*(?:\n|\r\n?)(?:==+|--+)(?=[ \t]*$)/m,alias:"important",inside:{punctuation:/==+$|--+$/}},{pattern:/(^\s*)#.+/m,lookbehind:!0,alias:"important",inside:{punctuation:/^#+|#+$/}}],hr:{pattern:/(^\s*)([*-])(?:[\t ]*\2){2,}(?=\s*$)/m,lookbehind:!0,alias:"punctuation"},list:{pattern:/(^\s*)(?:[*+-]|\d+\.)(?=[\t ].)/m,lookbehind:!0,alias:"punctuation"},"url-reference":{pattern:/!?\[[^\]]+\]:[\t ]+(?:\S+|<(?:\\.|[^>\\])+>)(?:[\t ]+(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\)))?/,inside:{variable:{pattern:/^(!?\[)[^\]]+/,lookbehind:!0},string:/(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\))$/,punctuation:/^[\[\]!:]|[<>]/},alias:"url"},bold:{pattern:t(/\b__(?:(?!_)|_(?:(?!_))+_)+__\b|\*\*(?:(?!\*)|\*(?:(?!\*))+\*)+\*\*/.source),lookbehind:!0,greedy:!0,inside:{content:{pattern:/(^..)[\s\S]+(?=..$)/,lookbehind:!0,inside:{}},punctuation:/\*\*|__/}},italic:{pattern:t(/\b_(?:(?!_)|__(?:(?!_))+__)+_\b|\*(?:(?!\*)|\*\*(?:(?!\*))+\*\*)+\*/.source),lookbehind:!0,greedy:!0,inside:{content:{pattern:/(^.)[\s\S]+(?=.$)/,lookbehind:!0,inside:{}},punctuation:/[*_]/}},strike:{pattern:t(/(~~?)(?:(?!~))+\2/.source),lookbehind:!0,greedy:!0,inside:{content:{pattern:/(^~~?)[\s\S]+(?=\1$)/,lookbehind:!0,inside:{}},punctuation:/~~?/}},"code-snippet":{pattern:/(^|[^\\`])(?:``[^`\r\n]+(?:`[^`\r\n]+)*``(?!`)|`[^`\r\n]+`(?!`))/,lookbehind:!0,greedy:!0,alias:["code","keyword"]},url:{pattern:t(/!?\[(?:(?!\]))+\](?:\([^\s)]+(?:[\t ]+"(?:\\.|[^"\\])*")?\)|[ \t]?\[(?:(?!\]))+\])/.source),lookbehind:!0,greedy:!0,inside:{operator:/^!/,content:{pattern:/(^\[)[^\]]+(?=\])/,lookbehind:!0,inside:{}},variable:{pattern:/(^\][ \t]?\[)[^\]]+(?=\]$)/,lookbehind:!0},url:{pattern:/(^\]\()[^\s)]+/,lookbehind:!0},string:{pattern:/(^[ \t]+)"(?:\\.|[^"\\])*"(?=\)$)/,lookbehind:!0}}}}),["url","bold","italic","strike"].forEach((function(n){["url","bold","italic","strike","code-snippet"].forEach((function(t){n!==t&&(e.languages.markdown[n].inside.content.inside[t]=e.languages.markdown[t])}))})),e.hooks.add("after-tokenize",(function(e){"markdown"!==e.language&&"md"!==e.language||function e(n){if(n&&"string"!=typeof n)for(var t=0,o=n.length;t",quot:'"'},c=String.fromCodePoint||String.fromCharCode;e.languages.md=e.languages.markdown}(a),a.languages.graphql={comment:/#.*/,description:{pattern:/(?:"""(?:[^"]|(?!""")")*"""|"(?:\\.|[^\\"\r\n])*")(?=\s*[a-z_])/i,greedy:!0,alias:"string",inside:{"language-markdown":{pattern:/(^"(?:"")?)(?!\1)[\s\S]+(?=\1$)/,lookbehind:!0,inside:a.languages.markdown}}},string:{pattern:/"""(?:[^"]|(?!""")")*"""|"(?:\\.|[^\\"\r\n])*"/,greedy:!0},number:/(?:\B-|\b)\d+(?:\.\d+)?(?:e[+-]?\d+)?\b/i,boolean:/\b(?:false|true)\b/,variable:/\$[a-z_]\w*/i,directive:{pattern:/@[a-z_]\w*/i,alias:"function"},"attr-name":{pattern:/\b[a-z_]\w*(?=\s*(?:\((?:[^()"]|"(?:\\.|[^\\"\r\n])*")*\))?:)/i,greedy:!0},"atom-input":{pattern:/\b[A-Z]\w*Input\b/,alias:"class-name"},scalar:/\b(?:Boolean|Float|ID|Int|String)\b/,constant:/\b[A-Z][A-Z_\d]*\b/,"class-name":{pattern:/(\b(?:enum|implements|interface|on|scalar|type|union)\s+|&\s*|:\s*|\[)[A-Z_]\w*/,lookbehind:!0},fragment:{pattern:/(\bfragment\s+|\.{3}\s*(?!on\b))[a-zA-Z_]\w*/,lookbehind:!0,alias:"function"},"definition-mutation":{pattern:/(\bmutation\s+)[a-zA-Z_]\w*/,lookbehind:!0,alias:"function"},"definition-query":{pattern:/(\bquery\s+)[a-zA-Z_]\w*/,lookbehind:!0,alias:"function"},keyword:/\b(?:directive|enum|extend|fragment|implements|input|interface|mutation|on|query|repeatable|scalar|schema|subscription|type|union)\b/,operator:/[!=|&]|\.{3}/,"property-query":/\w+(?=\s*\()/,object:/\w+(?=\s*\{)/,punctuation:/[!(){}\[\]:=,]/,property:/\w+/},a.hooks.add("after-tokenize",(function(e){if("graphql"===e.language)for(var n=e.tokens.filter((function(e){return"string"!=typeof e&&"comment"!==e.type&&"scalar"!==e.type})),t=0;t0)){var s=m(/^\{$/,/^\}$/);if(-1===s)continue;for(var c=t;c=0&&p(l,"variable-input")}}}}function u(e){return n[t+e]}function d(e,n){n=n||0;for(var t=0;t?|<|>)?|>[>=]?|\b(?:AND|BETWEEN|DIV|ILIKE|IN|IS|LIKE|NOT|OR|REGEXP|RLIKE|SOUNDS LIKE|XOR)\b/i,punctuation:/[;[\]()`,.]/},function(e){var n=e.languages.javascript["template-string"],t=n.pattern.source,o=n.inside.interpolation,a=o.inside["interpolation-punctuation"],r=o.pattern.source;function i(n,o){if(e.languages[n])return{pattern:RegExp("((?:"+o+")\\s*)"+t),lookbehind:!0,greedy:!0,inside:{"template-punctuation":{pattern:/^`|`$/,alias:"string"},"embedded-code":{pattern:/[\s\S]+/,alias:n}}}}function s(e,n){return"___"+n.toUpperCase()+"_"+e+"___"}function c(n,t,o){var a={code:n,grammar:t,language:o};return e.hooks.run("before-tokenize",a),a.tokens=e.tokenize(a.code,a.grammar),e.hooks.run("after-tokenize",a),a.tokens}function l(n){var t={};t["interpolation-punctuation"]=a;var r=e.tokenize(n,t);if(3===r.length){var i=[1,1];i.push.apply(i,c(r[1],e.languages.javascript,"javascript")),r.splice.apply(r,i)}return new e.Token("interpolation",r,o.alias,n)}function u(n,t,o){var a=e.tokenize(n,{interpolation:{pattern:RegExp(r),lookbehind:!0}}),i=0,u={},d=c(a.map((function(e){if("string"==typeof e)return e;for(var t,a=e.content;-1!==n.indexOf(t=s(i++,o)););return u[t]=a,t})).join(""),t,o),m=Object.keys(u);return i=0,function e(n){for(var t=0;t=m.length)return;var o=n[t];if("string"==typeof o||"string"==typeof o.content){var a=m[i],r="string"==typeof o?o:o.content,s=r.indexOf(a);if(-1!==s){++i;var c=r.substring(0,s),d=l(u[a]),p=r.substring(s+a.length),f=[];if(c&&f.push(c),f.push(d),p){var h=[p];e(h),f.push.apply(f,h)}"string"==typeof o?(n.splice.apply(n,[t,1].concat(f)),t+=f.length-1):o.content=f}}else{var g=o.content;Array.isArray(g)?e(g):e([g])}}}(d),new e.Token(o,d,"language-"+o,n)}e.languages.javascript["template-string"]=[i("css",/\b(?:styled(?:\([^)]*\))?(?:\s*\.\s*\w+(?:\([^)]*\))*)*|css(?:\s*\.\s*(?:global|resolve))?|createGlobalStyle|keyframes)/.source),i("html",/\bhtml|\.\s*(?:inner|outer)HTML\s*\+?=/.source),i("svg",/\bsvg/.source),i("markdown",/\b(?:markdown|md)/.source),i("graphql",/\b(?:gql|graphql(?:\s*\.\s*experimental)?)/.source),i("sql",/\bsql/.source),n].filter(Boolean);var d={javascript:!0,js:!0,typescript:!0,ts:!0,jsx:!0,tsx:!0};function m(e){return"string"==typeof e?e:Array.isArray(e)?e.map(m).join(""):m(e.content)}e.hooks.add("after-tokenize",(function(n){n.language in d&&function n(t){for(var o=0,a=t.length;o]|<(?:[^<>]|<[^<>]*>)*>)*>)?/,lookbehind:!0,greedy:!0,inside:null},builtin:/\b(?:Array|Function|Promise|any|boolean|console|never|number|string|symbol|unknown)\b/}),e.languages.typescript.keyword.push(/\b(?:abstract|declare|is|keyof|readonly|require)\b/,/\b(?:asserts|infer|interface|module|namespace|type)\b(?=\s*(?:[{_$a-zA-Z\xA0-\uFFFF]|$))/,/\btype\b(?=\s*(?:[\{*]|$))/),delete e.languages.typescript.parameter,delete e.languages.typescript["literal-property"];var n=e.languages.extend("typescript",{});delete n["class-name"],e.languages.typescript["class-name"].inside=n,e.languages.insertBefore("typescript","function",{decorator:{pattern:/@[$\w\xA0-\uFFFF]+/,inside:{at:{pattern:/^@/,alias:"operator"},function:/^[\s\S]+/}},"generic-function":{pattern:/#?(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*\s*<(?:[^<>]|<(?:[^<>]|<[^<>]*>)*>)*>(?=\s*\()/,greedy:!0,inside:{function:/^#?(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*/,generic:{pattern:/<[\s\S]+/,alias:"class-name",inside:n}}}}),e.languages.ts=e.languages.typescript}(a),function(e){function n(e,n){return RegExp(e.replace(//g,(function(){return/(?!\s)[_$a-zA-Z\xA0-\uFFFF](?:(?!\s)[$\w\xA0-\uFFFF])*/.source})),n)}e.languages.insertBefore("javascript","function-variable",{"method-variable":{pattern:RegExp("(\\.\\s*)"+e.languages.javascript["function-variable"].pattern.source),lookbehind:!0,alias:["function-variable","method","function","property-access"]}}),e.languages.insertBefore("javascript","function",{method:{pattern:RegExp("(\\.\\s*)"+e.languages.javascript.function.source),lookbehind:!0,alias:["function","property-access"]}}),e.languages.insertBefore("javascript","constant",{"known-class-name":[{pattern:/\b(?:(?:Float(?:32|64)|(?:Int|Uint)(?:8|16|32)|Uint8Clamped)?Array|ArrayBuffer|BigInt|Boolean|DataView|Date|Error|Function|Intl|JSON|(?:Weak)?(?:Map|Set)|Math|Number|Object|Promise|Proxy|Reflect|RegExp|String|Symbol|WebAssembly)\b/,alias:"class-name"},{pattern:/\b(?:[A-Z]\w*)Error\b/,alias:"class-name"}]}),e.languages.insertBefore("javascript","keyword",{imports:{pattern:n(/(\bimport\b\s*)(?:(?:\s*,\s*(?:\*\s*as\s+|\{[^{}]*\}))?|\*\s*as\s+|\{[^{}]*\})(?=\s*\bfrom\b)/.source),lookbehind:!0,inside:e.languages.javascript},exports:{pattern:n(/(\bexport\b\s*)(?:\*(?:\s*as\s+)?(?=\s*\bfrom\b)|\{[^{}]*\})/.source),lookbehind:!0,inside:e.languages.javascript}}),e.languages.javascript.keyword.unshift({pattern:/\b(?:as|default|export|from|import)\b/,alias:"module"},{pattern:/\b(?:await|break|catch|continue|do|else|finally|for|if|return|switch|throw|try|while|yield)\b/,alias:"control-flow"},{pattern:/\bnull\b/,alias:["null","nil"]},{pattern:/\bundefined\b/,alias:"nil"}),e.languages.insertBefore("javascript","operator",{spread:{pattern:/\.{3}/,alias:"operator"},arrow:{pattern:/=>/,alias:"operator"}}),e.languages.insertBefore("javascript","punctuation",{"property-access":{pattern:n(/(\.\s*)#?/.source),lookbehind:!0},"maybe-class-name":{pattern:/(^|[^$\w\xA0-\uFFFF])[A-Z][$\w\xA0-\uFFFF]+/,lookbehind:!0},dom:{pattern:/\b(?:document|(?:local|session)Storage|location|navigator|performance|window)\b/,alias:"variable"},console:{pattern:/\bconsole(?=\s*\.)/,alias:"class-name"}});for(var t=["function","function-variable","method","method-variable","property-access"],o=0;o*\.{3}(?:[^{}]|)*\})/.source;function r(e,n){return e=e.replace(//g,(function(){return t})).replace(//g,(function(){return o})).replace(//g,(function(){return a})),RegExp(e,n)}a=r(a).source,e.languages.jsx=e.languages.extend("markup",n),e.languages.jsx.tag.pattern=r(/<\/?(?:[\w.:-]+(?:+(?:[\w.:$-]+(?:=(?:"(?:\\[\s\S]|[^\\"])*"|'(?:\\[\s\S]|[^\\'])*'|[^\s{'"/>=]+|))?|))**\/?)?>/.source),e.languages.jsx.tag.inside.tag.pattern=/^<\/?[^\s>\/]*/,e.languages.jsx.tag.inside["attr-value"].pattern=/=(?!\{)(?:"(?:\\[\s\S]|[^\\"])*"|'(?:\\[\s\S]|[^\\'])*'|[^\s'">]+)/,e.languages.jsx.tag.inside.tag.inside["class-name"]=/^[A-Z]\w*(?:\.[A-Z]\w*)*$/,e.languages.jsx.tag.inside.comment=n.comment,e.languages.insertBefore("inside","attr-name",{spread:{pattern:r(//.source),inside:e.languages.jsx}},e.languages.jsx.tag),e.languages.insertBefore("inside","special-attr",{script:{pattern:r(/=/.source),alias:"language-javascript",inside:{"script-punctuation":{pattern:/^=(?=\{)/,alias:"punctuation"},rest:e.languages.jsx}}},e.languages.jsx.tag);var i=function(e){return e?"string"==typeof e?e:"string"==typeof e.content?e.content:e.content.map(i).join(""):""},s=function(n){for(var t=[],o=0;o0&&t[t.length-1].tagName===i(a.content[0].content[1])&&t.pop():"/>"===a.content[a.content.length-1].content||t.push({tagName:i(a.content[0].content[1]),openedBraces:0}):t.length>0&&"punctuation"===a.type&&"{"===a.content?t[t.length-1].openedBraces++:t.length>0&&t[t.length-1].openedBraces>0&&"punctuation"===a.type&&"}"===a.content?t[t.length-1].openedBraces--:r=!0),(r||"string"==typeof a)&&t.length>0&&0===t[t.length-1].openedBraces){var c=i(a);o0&&("string"==typeof n[o-1]||"plain-text"===n[o-1].type)&&(c=i(n[o-1])+c,n.splice(o-1,1),o--),n[o]=new e.Token("plain-text",c,null,c)}a.content&&"string"!=typeof a.content&&s(a.content)}};e.hooks.add("after-tokenize",(function(e){"jsx"!==e.language&&"tsx"!==e.language||s(e.tokens)}))}(a),function(e){e.languages.diff={coord:[/^(?:\*{3}|-{3}|\+{3}).*$/m,/^@@.*@@$/m,/^\d.*$/m]};var n={"deleted-sign":"-","deleted-arrow":"<","inserted-sign":"+","inserted-arrow":">",unchanged:" ",diff:"!"};Object.keys(n).forEach((function(t){var o=n[t],a=[];/^\w+$/.test(t)||a.push(/\w+/.exec(t)[0]),"diff"===t&&a.push("bold"),e.languages.diff[t]={pattern:RegExp("^(?:["+o+"].*(?:\r\n?|\n|(?![\\s\\S])))+","m"),alias:a,inside:{line:{pattern:/(.)(?=[\s\S]).*(?:\r\n?|\n)?/,lookbehind:!0},prefix:{pattern:/[\s\S]/,alias:/\w+/.exec(t)[0]}}}})),Object.defineProperty(e.languages.diff,"PREFIXES",{value:n})}(a),a.languages.git={comment:/^#.*/m,deleted:/^[-\u2013].*/m,inserted:/^\+.*/m,string:/("|')(?:\\.|(?!\1)[^\\\r\n])*\1/,command:{pattern:/^.*\$ git .*$/m,inside:{parameter:/\s--?\w+/}},coord:/^@@.*@@$/m,"commit-sha1":/^commit \w{40}$/m},a.languages.go=a.languages.extend("clike",{string:{pattern:/(^|[^\\])"(?:\\.|[^"\\\r\n])*"|`[^`]*`/,lookbehind:!0,greedy:!0},keyword:/\b(?:break|case|chan|const|continue|default|defer|else|fallthrough|for|func|go(?:to)?|if|import|interface|map|package|range|return|select|struct|switch|type|var)\b/,boolean:/\b(?:_|false|iota|nil|true)\b/,number:[/\b0(?:b[01_]+|o[0-7_]+)i?\b/i,/\b0x(?:[a-f\d_]+(?:\.[a-f\d_]*)?|\.[a-f\d_]+)(?:p[+-]?\d+(?:_\d+)*)?i?(?!\w)/i,/(?:\b\d[\d_]*(?:\.[\d_]*)?|\B\.\d[\d_]*)(?:e[+-]?[\d_]+)?i?(?!\w)/i],operator:/[*\/%^!=]=?|\+[=+]?|-[=-]?|\|[=|]?|&(?:=|&|\^=?)?|>(?:>=?|=)?|<(?:<=?|=|-)?|:=|\.\.\./,builtin:/\b(?:append|bool|byte|cap|close|complex|complex(?:64|128)|copy|delete|error|float(?:32|64)|u?int(?:8|16|32|64)?|imag|len|make|new|panic|print(?:ln)?|real|recover|rune|string|uintptr)\b/}),a.languages.insertBefore("go","string",{char:{pattern:/'(?:\\.|[^'\\\r\n]){0,10}'/,greedy:!0}}),delete a.languages.go["class-name"],function(e){function n(e,n){return"___"+e.toUpperCase()+n+"___"}Object.defineProperties(e.languages["markup-templating"]={},{buildPlaceholders:{value:function(t,o,a,r){if(t.language===o){var i=t.tokenStack=[];t.code=t.code.replace(a,(function(e){if("function"==typeof r&&!r(e))return e;for(var a,s=i.length;-1!==t.code.indexOf(a=n(o,s));)++s;return i[s]=e,a})),t.grammar=e.languages.markup}}},tokenizePlaceholders:{value:function(t,o){if(t.language===o&&t.tokenStack){t.grammar=e.languages[o];var a=0,r=Object.keys(t.tokenStack);!function i(s){for(var c=0;c=r.length);c++){var l=s[c];if("string"==typeof l||l.content&&"string"==typeof l.content){var u=r[a],d=t.tokenStack[u],m="string"==typeof l?l:l.content,p=n(o,u),f=m.indexOf(p);if(f>-1){++a;var h=m.substring(0,f),g=new e.Token(o,e.tokenize(d,t.grammar),"language-"+o,d),v=m.substring(f+p.length),b=[];h&&b.push.apply(b,i([h])),b.push(g),v&&b.push.apply(b,i([v])),"string"==typeof l?s.splice.apply(s,[c,1].concat(b)):l.content=b}}else l.content&&i(l.content)}return s}(t.tokens)}}}})}(a),function(e){e.languages.handlebars={comment:/\{\{![\s\S]*?\}\}/,delimiter:{pattern:/^\{\{\{?|\}\}\}?$/,alias:"punctuation"},string:/(["'])(?:\\.|(?!\1)[^\\\r\n])*\1/,number:/\b0x[\dA-Fa-f]+\b|(?:\b\d+(?:\.\d*)?|\B\.\d+)(?:[Ee][+-]?\d+)?/,boolean:/\b(?:false|true)\b/,block:{pattern:/^(\s*(?:~\s*)?)[#\/]\S+?(?=\s*(?:~\s*)?$|\s)/,lookbehind:!0,alias:"keyword"},brackets:{pattern:/\[[^\]]+\]/,inside:{punctuation:/\[|\]/,variable:/[\s\S]+/}},punctuation:/[!"#%&':()*+,.\/;<=>@\[\\\]^`{|}~]/,variable:/[^!"#%&'()*+,\/;<=>@\[\\\]^`{|}~\s]+/},e.hooks.add("before-tokenize",(function(n){e.languages["markup-templating"].buildPlaceholders(n,"handlebars",/\{\{\{[\s\S]+?\}\}\}|\{\{[\s\S]+?\}\}/g)})),e.hooks.add("after-tokenize",(function(n){e.languages["markup-templating"].tokenizePlaceholders(n,"handlebars")})),e.languages.hbs=e.languages.handlebars}(a),a.languages.json={property:{pattern:/(^|[^\\])"(?:\\.|[^\\"\r\n])*"(?=\s*:)/,lookbehind:!0,greedy:!0},string:{pattern:/(^|[^\\])"(?:\\.|[^\\"\r\n])*"(?!\s*:)/,lookbehind:!0,greedy:!0},comment:{pattern:/\/\/.*|\/\*[\s\S]*?(?:\*\/|$)/,greedy:!0},number:/-?\b\d+(?:\.\d+)?(?:e[+-]?\d+)?\b/i,punctuation:/[{}[\],]/,operator:/:/,boolean:/\b(?:false|true)\b/,null:{pattern:/\bnull\b/,alias:"keyword"}},a.languages.webmanifest=a.languages.json,a.languages.less=a.languages.extend("css",{comment:[/\/\*[\s\S]*?\*\//,{pattern:/(^|[^\\])\/\/.*/,lookbehind:!0}],atrule:{pattern:/@[\w-](?:\((?:[^(){}]|\([^(){}]*\))*\)|[^(){};\s]|\s+(?!\s))*?(?=\s*\{)/,inside:{punctuation:/[:()]/}},selector:{pattern:/(?:@\{[\w-]+\}|[^{};\s@])(?:@\{[\w-]+\}|\((?:[^(){}]|\([^(){}]*\))*\)|[^(){};@\s]|\s+(?!\s))*?(?=\s*\{)/,inside:{variable:/@+[\w-]+/}},property:/(?:@\{[\w-]+\}|[\w-])+(?:\+_?)?(?=\s*:)/,operator:/[+\-*\/]/}),a.languages.insertBefore("less","property",{variable:[{pattern:/@[\w-]+\s*:/,inside:{punctuation:/:/}},/@@?[\w-]+/],"mixin-usage":{pattern:/([{;]\s*)[.#](?!\d)[\w-].*?(?=[(;])/,lookbehind:!0,alias:"function"}}),a.languages.makefile={comment:{pattern:/(^|[^\\])#(?:\\(?:\r\n|[\s\S])|[^\\\r\n])*/,lookbehind:!0},string:{pattern:/(["'])(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,greedy:!0},"builtin-target":{pattern:/\.[A-Z][^:#=\s]+(?=\s*:(?!=))/,alias:"builtin"},target:{pattern:/^(?:[^:=\s]|[ \t]+(?![\s:]))+(?=\s*:(?!=))/m,alias:"symbol",inside:{variable:/\$+(?:(?!\$)[^(){}:#=\s]+|(?=[({]))/}},variable:/\$+(?:(?!\$)[^(){}:#=\s]+|\([@*%<^+?][DF]\)|(?=[({]))/,keyword:/-include\b|\b(?:define|else|endef|endif|export|ifn?def|ifn?eq|include|override|private|sinclude|undefine|unexport|vpath)\b/,function:{pattern:/(\()(?:abspath|addsuffix|and|basename|call|dir|error|eval|file|filter(?:-out)?|findstring|firstword|flavor|foreach|guile|if|info|join|lastword|load|notdir|or|origin|patsubst|realpath|shell|sort|strip|subst|suffix|value|warning|wildcard|word(?:list|s)?)(?=[ \t])/,lookbehind:!0},operator:/(?:::|[?:+!])?=|[|@]/,punctuation:/[:;(){}]/},a.languages.objectivec=a.languages.extend("c",{string:{pattern:/@?"(?:\\(?:\r\n|[\s\S])|[^"\\\r\n])*"/,greedy:!0},keyword:/\b(?:asm|auto|break|case|char|const|continue|default|do|double|else|enum|extern|float|for|goto|if|in|inline|int|long|register|return|self|short|signed|sizeof|static|struct|super|switch|typedef|typeof|union|unsigned|void|volatile|while)\b|(?:@interface|@end|@implementation|@protocol|@class|@public|@protected|@private|@property|@try|@catch|@finally|@throw|@synthesize|@dynamic|@selector)\b/,operator:/-[->]?|\+\+?|!=?|<>?=?|==?|&&?|\|\|?|[~^%?*\/@]/}),delete a.languages.objectivec["class-name"],a.languages.objc=a.languages.objectivec,a.languages.ocaml={comment:{pattern:/\(\*[\s\S]*?\*\)/,greedy:!0},char:{pattern:/'(?:[^\\\r\n']|\\(?:.|[ox]?[0-9a-f]{1,3}))'/i,greedy:!0},string:[{pattern:/"(?:\\(?:[\s\S]|\r\n)|[^\\\r\n"])*"/,greedy:!0},{pattern:/\{([a-z_]*)\|[\s\S]*?\|\1\}/,greedy:!0}],number:[/\b(?:0b[01][01_]*|0o[0-7][0-7_]*)\b/i,/\b0x[a-f0-9][a-f0-9_]*(?:\.[a-f0-9_]*)?(?:p[+-]?\d[\d_]*)?(?!\w)/i,/\b\d[\d_]*(?:\.[\d_]*)?(?:e[+-]?\d[\d_]*)?(?!\w)/i],directive:{pattern:/\B#\w+/,alias:"property"},label:{pattern:/\B~\w+/,alias:"property"},"type-variable":{pattern:/\B'\w+/,alias:"function"},variant:{pattern:/`\w+/,alias:"symbol"},keyword:/\b(?:as|assert|begin|class|constraint|do|done|downto|else|end|exception|external|for|fun|function|functor|if|in|include|inherit|initializer|lazy|let|match|method|module|mutable|new|nonrec|object|of|open|private|rec|sig|struct|then|to|try|type|val|value|virtual|when|where|while|with)\b/,boolean:/\b(?:false|true)\b/,"operator-like-punctuation":{pattern:/\[[<>|]|[>|]\]|\{<|>\}/,alias:"punctuation"},operator:/\.[.~]|:[=>]|[=<>@^|&+\-*\/$%!?~][!$%&*+\-.\/:<=>?@^|~]*|\b(?:and|asr|land|lor|lsl|lsr|lxor|mod|or)\b/,punctuation:/;;|::|[(){}\[\].,:;#]|\b_\b/},a.languages.python={comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0,greedy:!0},"string-interpolation":{pattern:/(?:f|fr|rf)(?:("""|''')[\s\S]*?\1|("|')(?:\\.|(?!\2)[^\\\r\n])*\2)/i,greedy:!0,inside:{interpolation:{pattern:/((?:^|[^{])(?:\{\{)*)\{(?!\{)(?:[^{}]|\{(?!\{)(?:[^{}]|\{(?!\{)(?:[^{}])+\})+\})+\}/,lookbehind:!0,inside:{"format-spec":{pattern:/(:)[^:(){}]+(?=\}$)/,lookbehind:!0},"conversion-option":{pattern:/![sra](?=[:}]$)/,alias:"punctuation"},rest:null}},string:/[\s\S]+/}},"triple-quoted-string":{pattern:/(?:[rub]|br|rb)?("""|''')[\s\S]*?\1/i,greedy:!0,alias:"string"},string:{pattern:/(?:[rub]|br|rb)?("|')(?:\\.|(?!\1)[^\\\r\n])*\1/i,greedy:!0},function:{pattern:/((?:^|\s)def[ \t]+)[a-zA-Z_]\w*(?=\s*\()/g,lookbehind:!0},"class-name":{pattern:/(\bclass\s+)\w+/i,lookbehind:!0},decorator:{pattern:/(^[\t ]*)@\w+(?:\.\w+)*/m,lookbehind:!0,alias:["annotation","punctuation"],inside:{punctuation:/\./}},keyword:/\b(?:_(?=\s*:)|and|as|assert|async|await|break|case|class|continue|def|del|elif|else|except|exec|finally|for|from|global|if|import|in|is|lambda|match|nonlocal|not|or|pass|print|raise|return|try|while|with|yield)\b/,builtin:/\b(?:__import__|abs|all|any|apply|ascii|basestring|bin|bool|buffer|bytearray|bytes|callable|chr|classmethod|cmp|coerce|compile|complex|delattr|dict|dir|divmod|enumerate|eval|execfile|file|filter|float|format|frozenset|getattr|globals|hasattr|hash|help|hex|id|input|int|intern|isinstance|issubclass|iter|len|list|locals|long|map|max|memoryview|min|next|object|oct|open|ord|pow|property|range|raw_input|reduce|reload|repr|reversed|round|set|setattr|slice|sorted|staticmethod|str|sum|super|tuple|type|unichr|unicode|vars|xrange|zip)\b/,boolean:/\b(?:False|None|True)\b/,number:/\b0(?:b(?:_?[01])+|o(?:_?[0-7])+|x(?:_?[a-f0-9])+)\b|(?:\b\d+(?:_\d+)*(?:\.(?:\d+(?:_\d+)*)?)?|\B\.\d+(?:_\d+)*)(?:e[+-]?\d+(?:_\d+)*)?j?(?!\w)/i,operator:/[-+%=]=?|!=|:=|\*\*?=?|\/\/?=?|<[<=>]?|>[=>]?|[&|^~]/,punctuation:/[{}[\];(),.:]/},a.languages.python["string-interpolation"].inside.interpolation.inside.rest=a.languages.python,a.languages.py=a.languages.python,a.languages.reason=a.languages.extend("clike",{string:{pattern:/"(?:\\(?:\r\n|[\s\S])|[^\\\r\n"])*"/,greedy:!0},"class-name":/\b[A-Z]\w*/,keyword:/\b(?:and|as|assert|begin|class|constraint|do|done|downto|else|end|exception|external|for|fun|function|functor|if|in|include|inherit|initializer|lazy|let|method|module|mutable|new|nonrec|object|of|open|or|private|rec|sig|struct|switch|then|to|try|type|val|virtual|when|while|with)\b/,operator:/\.{3}|:[:=]|\|>|->|=(?:==?|>)?|<=?|>=?|[|^?'#!~`]|[+\-*\/]\.?|\b(?:asr|land|lor|lsl|lsr|lxor|mod)\b/}),a.languages.insertBefore("reason","class-name",{char:{pattern:/'(?:\\x[\da-f]{2}|\\o[0-3][0-7][0-7]|\\\d{3}|\\.|[^'\\\r\n])'/,greedy:!0},constructor:/\b[A-Z]\w*\b(?!\s*\.)/,label:{pattern:/\b[a-z]\w*(?=::)/,alias:"symbol"}}),delete a.languages.reason.function,function(e){e.languages.sass=e.languages.extend("css",{comment:{pattern:/^([ \t]*)\/[\/*].*(?:(?:\r?\n|\r)\1[ \t].+)*/m,lookbehind:!0,greedy:!0}}),e.languages.insertBefore("sass","atrule",{"atrule-line":{pattern:/^(?:[ \t]*)[@+=].+/m,greedy:!0,inside:{atrule:/(?:@[\w-]+|[+=])/}}}),delete e.languages.sass.atrule;var n=/\$[-\w]+|#\{\$[-\w]+\}/,t=[/[+*\/%]|[=!]=|<=?|>=?|\b(?:and|not|or)\b/,{pattern:/(\s)-(?=\s)/,lookbehind:!0}];e.languages.insertBefore("sass","property",{"variable-line":{pattern:/^[ \t]*\$.+/m,greedy:!0,inside:{punctuation:/:/,variable:n,operator:t}},"property-line":{pattern:/^[ \t]*(?:[^:\s]+ *:.*|:[^:\s].*)/m,greedy:!0,inside:{property:[/[^:\s]+(?=\s*:)/,{pattern:/(:)[^:\s]+/,lookbehind:!0}],punctuation:/:/,variable:n,operator:t,important:e.languages.sass.important}}}),delete e.languages.sass.property,delete e.languages.sass.important,e.languages.insertBefore("sass","punctuation",{selector:{pattern:/^([ \t]*)\S(?:,[^,\r\n]+|[^,\r\n]*)(?:,[^,\r\n]+)*(?:,(?:\r?\n|\r)\1[ \t]+\S(?:,[^,\r\n]+|[^,\r\n]*)(?:,[^,\r\n]+)*)*/m,lookbehind:!0,greedy:!0}})}(a),a.languages.scss=a.languages.extend("css",{comment:{pattern:/(^|[^\\])(?:\/\*[\s\S]*?\*\/|\/\/.*)/,lookbehind:!0},atrule:{pattern:/@[\w-](?:\([^()]+\)|[^()\s]|\s+(?!\s))*?(?=\s+[{;])/,inside:{rule:/@[\w-]+/}},url:/(?:[-a-z]+-)?url(?=\()/i,selector:{pattern:/(?=\S)[^@;{}()]?(?:[^@;{}()\s]|\s+(?!\s)|#\{\$[-\w]+\})+(?=\s*\{(?:\}|\s|[^}][^:{}]*[:{][^}]))/,inside:{parent:{pattern:/&/,alias:"important"},placeholder:/%[-\w]+/,variable:/\$[-\w]+|#\{\$[-\w]+\}/}},property:{pattern:/(?:[-\w]|\$[-\w]|#\{\$[-\w]+\})+(?=\s*:)/,inside:{variable:/\$[-\w]+|#\{\$[-\w]+\}/}}}),a.languages.insertBefore("scss","atrule",{keyword:[/@(?:content|debug|each|else(?: if)?|extend|for|forward|function|if|import|include|mixin|return|use|warn|while)\b/i,{pattern:/( )(?:from|through)(?= )/,lookbehind:!0}]}),a.languages.insertBefore("scss","important",{variable:/\$[-\w]+|#\{\$[-\w]+\}/}),a.languages.insertBefore("scss","function",{"module-modifier":{pattern:/\b(?:as|hide|show|with)\b/i,alias:"keyword"},placeholder:{pattern:/%[-\w]+/,alias:"selector"},statement:{pattern:/\B!(?:default|optional)\b/i,alias:"keyword"},boolean:/\b(?:false|true)\b/,null:{pattern:/\bnull\b/,alias:"keyword"},operator:{pattern:/(\s)(?:[-+*\/%]|[=!]=|<=?|>=?|and|not|or)(?=\s)/,lookbehind:!0}}),a.languages.scss.atrule.inside.rest=a.languages.scss,function(e){var n={pattern:/(\b\d+)(?:%|[a-z]+)/,lookbehind:!0},t={pattern:/(^|[^\w.-])-?(?:\d+(?:\.\d+)?|\.\d+)/,lookbehind:!0},o={comment:{pattern:/(^|[^\\])(?:\/\*[\s\S]*?\*\/|\/\/.*)/,lookbehind:!0},url:{pattern:/\burl\((["']?).*?\1\)/i,greedy:!0},string:{pattern:/("|')(?:(?!\1)[^\\\r\n]|\\(?:\r\n|[\s\S]))*\1/,greedy:!0},interpolation:null,func:null,important:/\B!(?:important|optional)\b/i,keyword:{pattern:/(^|\s+)(?:(?:else|for|if|return|unless)(?=\s|$)|@[\w-]+)/,lookbehind:!0},hexcode:/#[\da-f]{3,6}/i,color:[/\b(?:AliceBlue|AntiqueWhite|Aqua|Aquamarine|Azure|Beige|Bisque|Black|BlanchedAlmond|Blue|BlueViolet|Brown|BurlyWood|CadetBlue|Chartreuse|Chocolate|Coral|CornflowerBlue|Cornsilk|Crimson|Cyan|DarkBlue|DarkCyan|DarkGoldenRod|DarkGr[ae]y|DarkGreen|DarkKhaki|DarkMagenta|DarkOliveGreen|DarkOrange|DarkOrchid|DarkRed|DarkSalmon|DarkSeaGreen|DarkSlateBlue|DarkSlateGr[ae]y|DarkTurquoise|DarkViolet|DeepPink|DeepSkyBlue|DimGr[ae]y|DodgerBlue|FireBrick|FloralWhite|ForestGreen|Fuchsia|Gainsboro|GhostWhite|Gold|GoldenRod|Gr[ae]y|Green|GreenYellow|HoneyDew|HotPink|IndianRed|Indigo|Ivory|Khaki|Lavender|LavenderBlush|LawnGreen|LemonChiffon|LightBlue|LightCoral|LightCyan|LightGoldenRodYellow|LightGr[ae]y|LightGreen|LightPink|LightSalmon|LightSeaGreen|LightSkyBlue|LightSlateGr[ae]y|LightSteelBlue|LightYellow|Lime|LimeGreen|Linen|Magenta|Maroon|MediumAquaMarine|MediumBlue|MediumOrchid|MediumPurple|MediumSeaGreen|MediumSlateBlue|MediumSpringGreen|MediumTurquoise|MediumVioletRed|MidnightBlue|MintCream|MistyRose|Moccasin|NavajoWhite|Navy|OldLace|Olive|OliveDrab|Orange|OrangeRed|Orchid|PaleGoldenRod|PaleGreen|PaleTurquoise|PaleVioletRed|PapayaWhip|PeachPuff|Peru|Pink|Plum|PowderBlue|Purple|Red|RosyBrown|RoyalBlue|SaddleBrown|Salmon|SandyBrown|SeaGreen|SeaShell|Sienna|Silver|SkyBlue|SlateBlue|SlateGr[ae]y|Snow|SpringGreen|SteelBlue|Tan|Teal|Thistle|Tomato|Transparent|Turquoise|Violet|Wheat|White|WhiteSmoke|Yellow|YellowGreen)\b/i,{pattern:/\b(?:hsl|rgb)\(\s*\d{1,3}\s*,\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*\)\B|\b(?:hsl|rgb)a\(\s*\d{1,3}\s*,\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*,\s*(?:0|0?\.\d+|1)\s*\)\B/i,inside:{unit:n,number:t,function:/[\w-]+(?=\()/,punctuation:/[(),]/}}],entity:/\\[\da-f]{1,8}/i,unit:n,boolean:/\b(?:false|true)\b/,operator:[/~|[+!\/%<>?=]=?|[-:]=|\*[*=]?|\.{2,3}|&&|\|\||\B-\B|\b(?:and|in|is(?: a| defined| not|nt)?|not|or)\b/],number:t,punctuation:/[{}()\[\];:,]/};o.interpolation={pattern:/\{[^\r\n}:]+\}/,alias:"variable",inside:{delimiter:{pattern:/^\{|\}$/,alias:"punctuation"},rest:o}},o.func={pattern:/[\w-]+\([^)]*\).*/,inside:{function:/^[^(]+/,rest:o}},e.languages.stylus={"atrule-declaration":{pattern:/(^[ \t]*)@.+/m,lookbehind:!0,inside:{atrule:/^@[\w-]+/,rest:o}},"variable-declaration":{pattern:/(^[ \t]*)[\w$-]+\s*.?=[ \t]*(?:\{[^{}]*\}|\S.*|$)/m,lookbehind:!0,inside:{variable:/^\S+/,rest:o}},statement:{pattern:/(^[ \t]*)(?:else|for|if|return|unless)[ \t].+/m,lookbehind:!0,inside:{keyword:/^\S+/,rest:o}},"property-declaration":{pattern:/((?:^|\{)([ \t]*))(?:[\w-]|\{[^}\r\n]+\})+(?:\s*:\s*|[ \t]+)(?!\s)[^{\r\n]*(?:;|[^{\r\n,]$(?!(?:\r?\n|\r)(?:\{|\2[ \t])))/m,lookbehind:!0,inside:{property:{pattern:/^[^\s:]+/,inside:{interpolation:o.interpolation}},rest:o}},selector:{pattern:/(^[ \t]*)(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\)|(?![\w-]))|\{[^}\r\n]+\})+)(?:(?:\r?\n|\r)(?:\1(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\)|(?![\w-]))|\{[^}\r\n]+\})+)))*(?:,$|\{|(?=(?:\r?\n|\r)(?:\{|\1[ \t])))/m,lookbehind:!0,inside:{interpolation:o.interpolation,comment:o.comment,punctuation:/[{},]/}},func:o.func,string:o.string,comment:{pattern:/(^|[^\\])(?:\/\*[\s\S]*?\*\/|\/\/.*)/,lookbehind:!0,greedy:!0},interpolation:o.interpolation,punctuation:/[{}()\[\];:.]/}}(a),function(e){var n=e.util.clone(e.languages.typescript);e.languages.tsx=e.languages.extend("jsx",n),delete e.languages.tsx.parameter,delete e.languages.tsx["literal-property"];var t=e.languages.tsx.tag;t.pattern=RegExp(/(^|[^\w$]|(?=<\/))/.source+"(?:"+t.pattern.source+")",t.pattern.flags),t.lookbehind=!0}(a),a.languages.wasm={comment:[/\(;[\s\S]*?;\)/,{pattern:/;;.*/,greedy:!0}],string:{pattern:/"(?:\\[\s\S]|[^"\\])*"/,greedy:!0},keyword:[{pattern:/\b(?:align|offset)=/,inside:{operator:/=/}},{pattern:/\b(?:(?:f32|f64|i32|i64)(?:\.(?:abs|add|and|ceil|clz|const|convert_[su]\/i(?:32|64)|copysign|ctz|demote\/f64|div(?:_[su])?|eqz?|extend_[su]\/i32|floor|ge(?:_[su])?|gt(?:_[su])?|le(?:_[su])?|load(?:(?:8|16|32)_[su])?|lt(?:_[su])?|max|min|mul|neg?|nearest|or|popcnt|promote\/f32|reinterpret\/[fi](?:32|64)|rem_[su]|rot[lr]|shl|shr_[su]|sqrt|store(?:8|16|32)?|sub|trunc(?:_[su]\/f(?:32|64))?|wrap\/i64|xor))?|memory\.(?:grow|size))\b/,inside:{punctuation:/\./}},/\b(?:anyfunc|block|br(?:_if|_table)?|call(?:_indirect)?|data|drop|elem|else|end|export|func|get_(?:global|local)|global|if|import|local|loop|memory|module|mut|nop|offset|param|result|return|select|set_(?:global|local)|start|table|tee_local|then|type|unreachable)\b/],variable:/\$[\w!#$%&'*+\-./:<=>?@\\^`|~]+/,number:/[+-]?\b(?:\d(?:_?\d)*(?:\.\d(?:_?\d)*)?(?:[eE][+-]?\d(?:_?\d)*)?|0x[\da-fA-F](?:_?[\da-fA-F])*(?:\.[\da-fA-F](?:_?[\da-fA-D])*)?(?:[pP][+-]?\d(?:_?\d)*)?)\b|\binf\b|\bnan(?::0x[\da-fA-F](?:_?[\da-fA-D])*)?\b/,punctuation:/[()]/};const r=a},9901:e=>{e.exports&&(e.exports={core:{meta:{path:"components/prism-core.js",option:"mandatory"},core:"Core"},themes:{meta:{path:"themes/{id}.css",link:"index.html?theme={id}",exclusive:!0},prism:{title:"Default",option:"default"},"prism-dark":"Dark","prism-funky":"Funky","prism-okaidia":{title:"Okaidia",owner:"ocodia"},"prism-twilight":{title:"Twilight",owner:"remybach"},"prism-coy":{title:"Coy",owner:"tshedor"},"prism-solarizedlight":{title:"Solarized Light",owner:"hectormatos2011 "},"prism-tomorrow":{title:"Tomorrow Night",owner:"Rosey"}},languages:{meta:{path:"components/prism-{id}",noCSS:!0,examplesPath:"examples/prism-{id}",addCheckAll:!0},markup:{title:"Markup",alias:["html","xml","svg","mathml","ssml","atom","rss"],aliasTitles:{html:"HTML",xml:"XML",svg:"SVG",mathml:"MathML",ssml:"SSML",atom:"Atom",rss:"RSS"},option:"default"},css:{title:"CSS",option:"default",modify:"markup"},clike:{title:"C-like",option:"default"},javascript:{title:"JavaScript",require:"clike",modify:"markup",optional:"regex",alias:"js",option:"default"},abap:{title:"ABAP",owner:"dellagustin"},abnf:{title:"ABNF",owner:"RunDevelopment"},actionscript:{title:"ActionScript",require:"javascript",modify:"markup",owner:"Golmote"},ada:{title:"Ada",owner:"Lucretia"},agda:{title:"Agda",owner:"xy-ren"},al:{title:"AL",owner:"RunDevelopment"},antlr4:{title:"ANTLR4",alias:"g4",owner:"RunDevelopment"},apacheconf:{title:"Apache Configuration",owner:"GuiTeK"},apex:{title:"Apex",require:["clike","sql"],owner:"RunDevelopment"},apl:{title:"APL",owner:"ngn"},applescript:{title:"AppleScript",owner:"Golmote"},aql:{title:"AQL",owner:"RunDevelopment"},arduino:{title:"Arduino",require:"cpp",alias:"ino",owner:"dkern"},arff:{title:"ARFF",owner:"Golmote"},armasm:{title:"ARM Assembly",alias:"arm-asm",owner:"RunDevelopment"},arturo:{title:"Arturo",alias:"art",optional:["bash","css","javascript","markup","markdown","sql"],owner:"drkameleon"},asciidoc:{alias:"adoc",title:"AsciiDoc",owner:"Golmote"},aspnet:{title:"ASP.NET (C#)",require:["markup","csharp"],owner:"nauzilus"},asm6502:{title:"6502 Assembly",owner:"kzurawel"},asmatmel:{title:"Atmel AVR Assembly",owner:"cerkit"},autohotkey:{title:"AutoHotkey",owner:"aviaryan"},autoit:{title:"AutoIt",owner:"Golmote"},avisynth:{title:"AviSynth",alias:"avs",owner:"Zinfidel"},"avro-idl":{title:"Avro IDL",alias:"avdl",owner:"RunDevelopment"},awk:{title:"AWK",alias:"gawk",aliasTitles:{gawk:"GAWK"},owner:"RunDevelopment"},bash:{title:"Bash",alias:["sh","shell"],aliasTitles:{sh:"Shell",shell:"Shell"},owner:"zeitgeist87"},basic:{title:"BASIC",owner:"Golmote"},batch:{title:"Batch",owner:"Golmote"},bbcode:{title:"BBcode",alias:"shortcode",aliasTitles:{shortcode:"Shortcode"},owner:"RunDevelopment"},bbj:{title:"BBj",owner:"hyyan"},bicep:{title:"Bicep",owner:"johnnyreilly"},birb:{title:"Birb",require:"clike",owner:"Calamity210"},bison:{title:"Bison",require:"c",owner:"Golmote"},bnf:{title:"BNF",alias:"rbnf",aliasTitles:{rbnf:"RBNF"},owner:"RunDevelopment"},bqn:{title:"BQN",owner:"yewscion"},brainfuck:{title:"Brainfuck",owner:"Golmote"},brightscript:{title:"BrightScript",owner:"RunDevelopment"},bro:{title:"Bro",owner:"wayward710"},bsl:{title:"BSL (1C:Enterprise)",alias:"oscript",aliasTitles:{oscript:"OneScript"},owner:"Diversus23"},c:{title:"C",require:"clike",owner:"zeitgeist87"},csharp:{title:"C#",require:"clike",alias:["cs","dotnet"],owner:"mvalipour"},cpp:{title:"C++",require:"c",owner:"zeitgeist87"},cfscript:{title:"CFScript",require:"clike",alias:"cfc",owner:"mjclemente"},chaiscript:{title:"ChaiScript",require:["clike","cpp"],owner:"RunDevelopment"},cil:{title:"CIL",owner:"sbrl"},cilkc:{title:"Cilk/C",require:"c",alias:"cilk-c",owner:"OpenCilk"},cilkcpp:{title:"Cilk/C++",require:"cpp",alias:["cilk-cpp","cilk"],owner:"OpenCilk"},clojure:{title:"Clojure",owner:"troglotit"},cmake:{title:"CMake",owner:"mjrogozinski"},cobol:{title:"COBOL",owner:"RunDevelopment"},coffeescript:{title:"CoffeeScript",require:"javascript",alias:"coffee",owner:"R-osey"},concurnas:{title:"Concurnas",alias:"conc",owner:"jasontatton"},csp:{title:"Content-Security-Policy",owner:"ScottHelme"},cooklang:{title:"Cooklang",owner:"ahue"},coq:{title:"Coq",owner:"RunDevelopment"},crystal:{title:"Crystal",require:"ruby",owner:"MakeNowJust"},"css-extras":{title:"CSS Extras",require:"css",modify:"css",owner:"milesj"},csv:{title:"CSV",owner:"RunDevelopment"},cue:{title:"CUE",owner:"RunDevelopment"},cypher:{title:"Cypher",owner:"RunDevelopment"},d:{title:"D",require:"clike",owner:"Golmote"},dart:{title:"Dart",require:"clike",owner:"Golmote"},dataweave:{title:"DataWeave",owner:"machaval"},dax:{title:"DAX",owner:"peterbud"},dhall:{title:"Dhall",owner:"RunDevelopment"},diff:{title:"Diff",owner:"uranusjr"},django:{title:"Django/Jinja2",require:"markup-templating",alias:"jinja2",owner:"romanvm"},"dns-zone-file":{title:"DNS zone file",owner:"RunDevelopment",alias:"dns-zone"},docker:{title:"Docker",alias:"dockerfile",owner:"JustinBeckwith"},dot:{title:"DOT (Graphviz)",alias:"gv",optional:"markup",owner:"RunDevelopment"},ebnf:{title:"EBNF",owner:"RunDevelopment"},editorconfig:{title:"EditorConfig",owner:"osipxd"},eiffel:{title:"Eiffel",owner:"Conaclos"},ejs:{title:"EJS",require:["javascript","markup-templating"],owner:"RunDevelopment",alias:"eta",aliasTitles:{eta:"Eta"}},elixir:{title:"Elixir",owner:"Golmote"},elm:{title:"Elm",owner:"zwilias"},etlua:{title:"Embedded Lua templating",require:["lua","markup-templating"],owner:"RunDevelopment"},erb:{title:"ERB",require:["ruby","markup-templating"],owner:"Golmote"},erlang:{title:"Erlang",owner:"Golmote"},"excel-formula":{title:"Excel Formula",alias:["xlsx","xls"],owner:"RunDevelopment"},fsharp:{title:"F#",require:"clike",owner:"simonreynolds7"},factor:{title:"Factor",owner:"catb0t"},false:{title:"False",owner:"edukisto"},"firestore-security-rules":{title:"Firestore security rules",require:"clike",owner:"RunDevelopment"},flow:{title:"Flow",require:"javascript",owner:"Golmote"},fortran:{title:"Fortran",owner:"Golmote"},ftl:{title:"FreeMarker Template Language",require:"markup-templating",owner:"RunDevelopment"},gml:{title:"GameMaker Language",alias:"gamemakerlanguage",require:"clike",owner:"LiarOnce"},gap:{title:"GAP (CAS)",owner:"RunDevelopment"},gcode:{title:"G-code",owner:"RunDevelopment"},gdscript:{title:"GDScript",owner:"RunDevelopment"},gedcom:{title:"GEDCOM",owner:"Golmote"},gettext:{title:"gettext",alias:"po",owner:"RunDevelopment"},gherkin:{title:"Gherkin",owner:"hason"},git:{title:"Git",owner:"lgiraudel"},glsl:{title:"GLSL",require:"c",owner:"Golmote"},gn:{title:"GN",alias:"gni",owner:"RunDevelopment"},"linker-script":{title:"GNU Linker Script",alias:"ld",owner:"RunDevelopment"},go:{title:"Go",require:"clike",owner:"arnehormann"},"go-module":{title:"Go module",alias:"go-mod",owner:"RunDevelopment"},gradle:{title:"Gradle",require:"clike",owner:"zeabdelkhalek-badido18"},graphql:{title:"GraphQL",optional:"markdown",owner:"Golmote"},groovy:{title:"Groovy",require:"clike",owner:"robfletcher"},haml:{title:"Haml",require:"ruby",optional:["css","css-extras","coffeescript","erb","javascript","less","markdown","scss","textile"],owner:"Golmote"},handlebars:{title:"Handlebars",require:"markup-templating",alias:["hbs","mustache"],aliasTitles:{mustache:"Mustache"},owner:"Golmote"},haskell:{title:"Haskell",alias:"hs",owner:"bholst"},haxe:{title:"Haxe",require:"clike",optional:"regex",owner:"Golmote"},hcl:{title:"HCL",owner:"outsideris"},hlsl:{title:"HLSL",require:"c",owner:"RunDevelopment"},hoon:{title:"Hoon",owner:"matildepark"},http:{title:"HTTP",optional:["csp","css","hpkp","hsts","javascript","json","markup","uri"],owner:"danielgtaylor"},hpkp:{title:"HTTP Public-Key-Pins",owner:"ScottHelme"},hsts:{title:"HTTP Strict-Transport-Security",owner:"ScottHelme"},ichigojam:{title:"IchigoJam",owner:"BlueCocoa"},icon:{title:"Icon",owner:"Golmote"},"icu-message-format":{title:"ICU Message Format",owner:"RunDevelopment"},idris:{title:"Idris",alias:"idr",owner:"KeenS",require:"haskell"},ignore:{title:".ignore",owner:"osipxd",alias:["gitignore","hgignore","npmignore"],aliasTitles:{gitignore:".gitignore",hgignore:".hgignore",npmignore:".npmignore"}},inform7:{title:"Inform 7",owner:"Golmote"},ini:{title:"Ini",owner:"aviaryan"},io:{title:"Io",owner:"AlesTsurko"},j:{title:"J",owner:"Golmote"},java:{title:"Java",require:"clike",owner:"sherblot"},javadoc:{title:"JavaDoc",require:["markup","java","javadoclike"],modify:"java",optional:"scala",owner:"RunDevelopment"},javadoclike:{title:"JavaDoc-like",modify:["java","javascript","php"],owner:"RunDevelopment"},javastacktrace:{title:"Java stack trace",owner:"RunDevelopment"},jexl:{title:"Jexl",owner:"czosel"},jolie:{title:"Jolie",require:"clike",owner:"thesave"},jq:{title:"JQ",owner:"RunDevelopment"},jsdoc:{title:"JSDoc",require:["javascript","javadoclike","typescript"],modify:"javascript",optional:["actionscript","coffeescript"],owner:"RunDevelopment"},"js-extras":{title:"JS Extras",require:"javascript",modify:"javascript",optional:["actionscript","coffeescript","flow","n4js","typescript"],owner:"RunDevelopment"},json:{title:"JSON",alias:"webmanifest",aliasTitles:{webmanifest:"Web App Manifest"},owner:"CupOfTea696"},json5:{title:"JSON5",require:"json",owner:"RunDevelopment"},jsonp:{title:"JSONP",require:"json",owner:"RunDevelopment"},jsstacktrace:{title:"JS stack trace",owner:"sbrl"},"js-templates":{title:"JS Templates",require:"javascript",modify:"javascript",optional:["css","css-extras","graphql","markdown","markup","sql"],owner:"RunDevelopment"},julia:{title:"Julia",owner:"cdagnino"},keepalived:{title:"Keepalived Configure",owner:"dev-itsheng"},keyman:{title:"Keyman",owner:"mcdurdin"},kotlin:{title:"Kotlin",alias:["kt","kts"],aliasTitles:{kts:"Kotlin Script"},require:"clike",owner:"Golmote"},kumir:{title:"KuMir (\u041a\u0443\u041c\u0438\u0440)",alias:"kum",owner:"edukisto"},kusto:{title:"Kusto",owner:"RunDevelopment"},latex:{title:"LaTeX",alias:["tex","context"],aliasTitles:{tex:"TeX",context:"ConTeXt"},owner:"japborst"},latte:{title:"Latte",require:["clike","markup-templating","php"],owner:"nette"},less:{title:"Less",require:"css",optional:"css-extras",owner:"Golmote"},lilypond:{title:"LilyPond",require:"scheme",alias:"ly",owner:"RunDevelopment"},liquid:{title:"Liquid",require:"markup-templating",owner:"cinhtau"},lisp:{title:"Lisp",alias:["emacs","elisp","emacs-lisp"],owner:"JuanCaicedo"},livescript:{title:"LiveScript",owner:"Golmote"},llvm:{title:"LLVM IR",owner:"porglezomp"},log:{title:"Log file",optional:"javastacktrace",owner:"RunDevelopment"},lolcode:{title:"LOLCODE",owner:"Golmote"},lua:{title:"Lua",owner:"Golmote"},magma:{title:"Magma (CAS)",owner:"RunDevelopment"},makefile:{title:"Makefile",owner:"Golmote"},markdown:{title:"Markdown",require:"markup",optional:"yaml",alias:"md",owner:"Golmote"},"markup-templating":{title:"Markup templating",require:"markup",owner:"Golmote"},mata:{title:"Mata",owner:"RunDevelopment"},matlab:{title:"MATLAB",owner:"Golmote"},maxscript:{title:"MAXScript",owner:"RunDevelopment"},mel:{title:"MEL",owner:"Golmote"},mermaid:{title:"Mermaid",owner:"RunDevelopment"},metafont:{title:"METAFONT",owner:"LaeriExNihilo"},mizar:{title:"Mizar",owner:"Golmote"},mongodb:{title:"MongoDB",owner:"airs0urce",require:"javascript"},monkey:{title:"Monkey",owner:"Golmote"},moonscript:{title:"MoonScript",alias:"moon",owner:"RunDevelopment"},n1ql:{title:"N1QL",owner:"TMWilds"},n4js:{title:"N4JS",require:"javascript",optional:"jsdoc",alias:"n4jsd",owner:"bsmith-n4"},"nand2tetris-hdl":{title:"Nand To Tetris HDL",owner:"stephanmax"},naniscript:{title:"Naninovel Script",owner:"Elringus",alias:"nani"},nasm:{title:"NASM",owner:"rbmj"},neon:{title:"NEON",owner:"nette"},nevod:{title:"Nevod",owner:"nezaboodka"},nginx:{title:"nginx",owner:"volado"},nim:{title:"Nim",owner:"Golmote"},nix:{title:"Nix",owner:"Golmote"},nsis:{title:"NSIS",owner:"idleberg"},objectivec:{title:"Objective-C",require:"c",alias:"objc",owner:"uranusjr"},ocaml:{title:"OCaml",owner:"Golmote"},odin:{title:"Odin",owner:"edukisto"},opencl:{title:"OpenCL",require:"c",modify:["c","cpp"],owner:"Milania1"},openqasm:{title:"OpenQasm",alias:"qasm",owner:"RunDevelopment"},oz:{title:"Oz",owner:"Golmote"},parigp:{title:"PARI/GP",owner:"Golmote"},parser:{title:"Parser",require:"markup",owner:"Golmote"},pascal:{title:"Pascal",alias:"objectpascal",aliasTitles:{objectpascal:"Object Pascal"},owner:"Golmote"},pascaligo:{title:"Pascaligo",owner:"DefinitelyNotAGoat"},psl:{title:"PATROL Scripting Language",owner:"bertysentry"},pcaxis:{title:"PC-Axis",alias:"px",owner:"RunDevelopment"},peoplecode:{title:"PeopleCode",alias:"pcode",owner:"RunDevelopment"},perl:{title:"Perl",owner:"Golmote"},php:{title:"PHP",require:"markup-templating",owner:"milesj"},phpdoc:{title:"PHPDoc",require:["php","javadoclike"],modify:"php",owner:"RunDevelopment"},"php-extras":{title:"PHP Extras",require:"php",modify:"php",owner:"milesj"},"plant-uml":{title:"PlantUML",alias:"plantuml",owner:"RunDevelopment"},plsql:{title:"PL/SQL",require:"sql",owner:"Golmote"},powerquery:{title:"PowerQuery",alias:["pq","mscript"],owner:"peterbud"},powershell:{title:"PowerShell",owner:"nauzilus"},processing:{title:"Processing",require:"clike",owner:"Golmote"},prolog:{title:"Prolog",owner:"Golmote"},promql:{title:"PromQL",owner:"arendjr"},properties:{title:".properties",owner:"Golmote"},protobuf:{title:"Protocol Buffers",require:"clike",owner:"just-boris"},pug:{title:"Pug",require:["markup","javascript"],optional:["coffeescript","ejs","handlebars","less","livescript","markdown","scss","stylus","twig"],owner:"Golmote"},puppet:{title:"Puppet",owner:"Golmote"},pure:{title:"Pure",optional:["c","cpp","fortran"],owner:"Golmote"},purebasic:{title:"PureBasic",require:"clike",alias:"pbfasm",owner:"HeX0R101"},purescript:{title:"PureScript",require:"haskell",alias:"purs",owner:"sriharshachilakapati"},python:{title:"Python",alias:"py",owner:"multipetros"},qsharp:{title:"Q#",require:"clike",alias:"qs",owner:"fedonman"},q:{title:"Q (kdb+ database)",owner:"Golmote"},qml:{title:"QML",require:"javascript",owner:"RunDevelopment"},qore:{title:"Qore",require:"clike",owner:"temnroegg"},r:{title:"R",owner:"Golmote"},racket:{title:"Racket",require:"scheme",alias:"rkt",owner:"RunDevelopment"},cshtml:{title:"Razor C#",alias:"razor",require:["markup","csharp"],optional:["css","css-extras","javascript","js-extras"],owner:"RunDevelopment"},jsx:{title:"React JSX",require:["markup","javascript"],optional:["jsdoc","js-extras","js-templates"],owner:"vkbansal"},tsx:{title:"React TSX",require:["jsx","typescript"]},reason:{title:"Reason",require:"clike",owner:"Golmote"},regex:{title:"Regex",owner:"RunDevelopment"},rego:{title:"Rego",owner:"JordanSh"},renpy:{title:"Ren'py",alias:"rpy",owner:"HyuchiaDiego"},rescript:{title:"ReScript",alias:"res",owner:"vmarcosp"},rest:{title:"reST (reStructuredText)",owner:"Golmote"},rip:{title:"Rip",owner:"ravinggenius"},roboconf:{title:"Roboconf",owner:"Golmote"},robotframework:{title:"Robot Framework",alias:"robot",owner:"RunDevelopment"},ruby:{title:"Ruby",require:"clike",alias:"rb",owner:"samflores"},rust:{title:"Rust",owner:"Golmote"},sas:{title:"SAS",optional:["groovy","lua","sql"],owner:"Golmote"},sass:{title:"Sass (Sass)",require:"css",optional:"css-extras",owner:"Golmote"},scss:{title:"Sass (SCSS)",require:"css",optional:"css-extras",owner:"MoOx"},scala:{title:"Scala",require:"java",owner:"jozic"},scheme:{title:"Scheme",owner:"bacchus123"},"shell-session":{title:"Shell session",require:"bash",alias:["sh-session","shellsession"],owner:"RunDevelopment"},smali:{title:"Smali",owner:"RunDevelopment"},smalltalk:{title:"Smalltalk",owner:"Golmote"},smarty:{title:"Smarty",require:"markup-templating",optional:"php",owner:"Golmote"},sml:{title:"SML",alias:"smlnj",aliasTitles:{smlnj:"SML/NJ"},owner:"RunDevelopment"},solidity:{title:"Solidity (Ethereum)",alias:"sol",require:"clike",owner:"glachaud"},"solution-file":{title:"Solution file",alias:"sln",owner:"RunDevelopment"},soy:{title:"Soy (Closure Template)",require:"markup-templating",owner:"Golmote"},sparql:{title:"SPARQL",require:"turtle",owner:"Triply-Dev",alias:"rq"},"splunk-spl":{title:"Splunk SPL",owner:"RunDevelopment"},sqf:{title:"SQF: Status Quo Function (Arma 3)",require:"clike",owner:"RunDevelopment"},sql:{title:"SQL",owner:"multipetros"},squirrel:{title:"Squirrel",require:"clike",owner:"RunDevelopment"},stan:{title:"Stan",owner:"RunDevelopment"},stata:{title:"Stata Ado",require:["mata","java","python"],owner:"RunDevelopment"},iecst:{title:"Structured Text (IEC 61131-3)",owner:"serhioromano"},stylus:{title:"Stylus",owner:"vkbansal"},supercollider:{title:"SuperCollider",alias:"sclang",owner:"RunDevelopment"},swift:{title:"Swift",owner:"chrischares"},systemd:{title:"Systemd configuration file",owner:"RunDevelopment"},"t4-templating":{title:"T4 templating",owner:"RunDevelopment"},"t4-cs":{title:"T4 Text Templates (C#)",require:["t4-templating","csharp"],alias:"t4",owner:"RunDevelopment"},"t4-vb":{title:"T4 Text Templates (VB)",require:["t4-templating","vbnet"],owner:"RunDevelopment"},tap:{title:"TAP",owner:"isaacs",require:"yaml"},tcl:{title:"Tcl",owner:"PeterChaplin"},tt2:{title:"Template Toolkit 2",require:["clike","markup-templating"],owner:"gflohr"},textile:{title:"Textile",require:"markup",optional:"css",owner:"Golmote"},toml:{title:"TOML",owner:"RunDevelopment"},tremor:{title:"Tremor",alias:["trickle","troy"],owner:"darach",aliasTitles:{trickle:"trickle",troy:"troy"}},turtle:{title:"Turtle",alias:"trig",aliasTitles:{trig:"TriG"},owner:"jakubklimek"},twig:{title:"Twig",require:"markup-templating",owner:"brandonkelly"},typescript:{title:"TypeScript",require:"javascript",optional:"js-templates",alias:"ts",owner:"vkbansal"},typoscript:{title:"TypoScript",alias:"tsconfig",aliasTitles:{tsconfig:"TSConfig"},owner:"dkern"},unrealscript:{title:"UnrealScript",alias:["uscript","uc"],owner:"RunDevelopment"},uorazor:{title:"UO Razor Script",owner:"jaseowns"},uri:{title:"URI",alias:"url",aliasTitles:{url:"URL"},owner:"RunDevelopment"},v:{title:"V",require:"clike",owner:"taggon"},vala:{title:"Vala",require:"clike",optional:"regex",owner:"TemplarVolk"},vbnet:{title:"VB.Net",require:"basic",owner:"Bigsby"},velocity:{title:"Velocity",require:"markup",owner:"Golmote"},verilog:{title:"Verilog",owner:"a-rey"},vhdl:{title:"VHDL",owner:"a-rey"},vim:{title:"vim",owner:"westonganger"},"visual-basic":{title:"Visual Basic",alias:["vb","vba"],aliasTitles:{vba:"VBA"},owner:"Golmote"},warpscript:{title:"WarpScript",owner:"RunDevelopment"},wasm:{title:"WebAssembly",owner:"Golmote"},"web-idl":{title:"Web IDL",alias:"webidl",owner:"RunDevelopment"},wgsl:{title:"WGSL",owner:"Dr4gonthree"},wiki:{title:"Wiki markup",require:"markup",owner:"Golmote"},wolfram:{title:"Wolfram language",alias:["mathematica","nb","wl"],aliasTitles:{mathematica:"Mathematica",nb:"Mathematica Notebook"},owner:"msollami"},wren:{title:"Wren",owner:"clsource"},xeora:{title:"Xeora",require:"markup",alias:"xeoracube",aliasTitles:{xeoracube:"XeoraCube"},owner:"freakmaxi"},"xml-doc":{title:"XML doc (.net)",require:"markup",modify:["csharp","fsharp","vbnet"],owner:"RunDevelopment"},xojo:{title:"Xojo (REALbasic)",owner:"Golmote"},xquery:{title:"XQuery",require:"markup",owner:"Golmote"},yaml:{title:"YAML",alias:"yml",owner:"hason"},yang:{title:"YANG",owner:"RunDevelopment"},zig:{title:"Zig",owner:"RunDevelopment"}},plugins:{meta:{path:"plugins/{id}/prism-{id}",link:"plugins/{id}/"},"line-highlight":{title:"Line Highlight",description:"Highlights specific lines and/or line ranges."},"line-numbers":{title:"Line Numbers",description:"Line number at the beginning of code lines.",owner:"kuba-kubula"},"show-invisibles":{title:"Show Invisibles",description:"Show hidden characters such as tabs and line breaks.",optional:["autolinker","data-uri-highlight"]},autolinker:{title:"Autolinker",description:"Converts URLs and emails in code to clickable links. Parses Markdown links in comments."},wpd:{title:"WebPlatform Docs",description:'Makes tokens link to WebPlatform.org documentation. The links open in a new tab.'},"custom-class":{title:"Custom Class",description:"This plugin allows you to prefix Prism's default classes (.comment can become .namespace--comment) or replace them with your defined ones (like .editor__comment). You can even add new classes.",owner:"dvkndn",noCSS:!0},"file-highlight":{title:"File Highlight",description:"Fetch external files and highlight them with Prism. Used on the Prism website itself.",noCSS:!0},"show-language":{title:"Show Language",description:"Display the highlighted language in code blocks (inline code does not show the label).",owner:"nauzilus",noCSS:!0,require:"toolbar"},"jsonp-highlight":{title:"JSONP Highlight",description:"Fetch content with JSONP and highlight some interesting content (e.g. GitHub/Gists or Bitbucket API).",noCSS:!0,owner:"nauzilus"},"highlight-keywords":{title:"Highlight Keywords",description:"Adds special CSS classes for each keyword for fine-grained highlighting.",owner:"vkbansal",noCSS:!0},"remove-initial-line-feed":{title:"Remove initial line feed",description:"Removes the initial line feed in code blocks.",owner:"Golmote",noCSS:!0},"inline-color":{title:"Inline color",description:"Adds a small inline preview for colors in style sheets.",require:"css-extras",owner:"RunDevelopment"},previewers:{title:"Previewers",description:"Previewers for angles, colors, gradients, easing and time.",require:"css-extras",owner:"Golmote"},autoloader:{title:"Autoloader",description:"Automatically loads the needed languages to highlight the code blocks.",owner:"Golmote",noCSS:!0},"keep-markup":{title:"Keep Markup",description:"Prevents custom markup from being dropped out during highlighting.",owner:"Golmote",optional:"normalize-whitespace",noCSS:!0},"command-line":{title:"Command Line",description:"Display a command line with a prompt and, optionally, the output/response from the commands.",owner:"chriswells0"},"unescaped-markup":{title:"Unescaped Markup",description:"Write markup without having to escape anything."},"normalize-whitespace":{title:"Normalize Whitespace",description:"Supports multiple operations to normalize whitespace in code blocks.",owner:"zeitgeist87",optional:"unescaped-markup",noCSS:!0},"data-uri-highlight":{title:"Data-URI Highlight",description:"Highlights data-URI contents.",owner:"Golmote",noCSS:!0},toolbar:{title:"Toolbar",description:"Attach a toolbar for plugins to easily register buttons on the top of a code block.",owner:"mAAdhaTTah"},"copy-to-clipboard":{title:"Copy to Clipboard Button",description:"Add a button that copies the code block to the clipboard when clicked.",owner:"mAAdhaTTah",require:"toolbar",noCSS:!0},"download-button":{title:"Download Button",description:"A button in the toolbar of a code block adding a convenient way to download a code file.",owner:"Golmote",require:"toolbar",noCSS:!0},"match-braces":{title:"Match braces",description:"Highlights matching braces.",owner:"RunDevelopment"},"diff-highlight":{title:"Diff Highlight",description:"Highlights the code inside diff blocks.",owner:"RunDevelopment",require:"diff"},"filter-highlight-all":{title:"Filter highlightAll",description:"Filters the elements the highlightAll and highlightAllUnder methods actually highlight.",owner:"RunDevelopment",noCSS:!0},treeview:{title:"Treeview",description:"A language with special styles to highlight file system tree structures.",owner:"Golmote"}}})},2885:(e,n,t)=>{const o=t(9901),a=t(9642),r=new Set;function i(e){void 0===e?e=Object.keys(o.languages).filter((e=>"meta"!=e)):Array.isArray(e)||(e=[e]);const n=[...r,...Object.keys(Prism.languages)];a(o,e,n).load((e=>{if(!(e in o.languages))return void(i.silent||console.warn("Language does not exist: "+e));const n="./prism-"+e;delete t.c[t(6500).resolve(n)],delete Prism.languages[e],t(6500)(n),r.add(e)}))}i.silent=!1,e.exports=i},6726:(e,n,t)=>{var o={"./":2885};function a(e){var n=r(e);return t(n)}function r(e){if(!t.o(o,e)){var n=new Error("Cannot find module '"+e+"'");throw n.code="MODULE_NOT_FOUND",n}return o[e]}a.keys=function(){return Object.keys(o)},a.resolve=r,e.exports=a,a.id=6726},6500:(e,n,t)=>{var o={"./":2885};function a(e){var n=r(e);return t(n)}function r(e){if(!t.o(o,e)){var n=new Error("Cannot find module '"+e+"'");throw n.code="MODULE_NOT_FOUND",n}return o[e]}a.keys=function(){return Object.keys(o)},a.resolve=r,e.exports=a,a.id=6500},9642:e=>{"use strict";var n=function(){var e=function(){};function n(e,n){Array.isArray(e)?e.forEach(n):null!=e&&n(e,0)}function t(e){for(var n={},t=0,o=e.length;t "));var s={},c=e[o];if(c){function l(n){if(!(n in e))throw new Error(o+" depends on an unknown component "+n);if(!(n in s))for(var i in a(n,r),s[n]=!0,t[n])s[i]=!0}n(c.require,l),n(c.optional,l),n(c.modify,l)}t[o]=s,r.pop()}}return function(e){var n=t[e];return n||(a(e,o),n=t[e]),n}}function a(e){for(var n in e)return!0;return!1}return function(r,i,s){var c=function(e){var n={};for(var t in e){var o=e[t];for(var a in o)if("meta"!=a){var r=o[a];n[a]="string"==typeof r?{title:r}:r}}return n}(r),l=function(e){var t;return function(o){if(o in e)return o;if(!t)for(var a in t={},e){var r=e[a];n(r&&r.alias,(function(n){if(n in t)throw new Error(n+" cannot be alias for both "+a+" and "+t[n]);if(n in e)throw new Error(n+" cannot be alias of "+a+" because it is a component.");t[n]=a}))}return t[o]||o}}(c);i=i.map(l),s=(s||[]).map(l);var u=t(i),d=t(s);i.forEach((function e(t){var o=c[t];n(o&&o.require,(function(n){n in d||(u[n]=!0,e(n))}))}));for(var m,p=o(c),f=u;a(f);){for(var h in m={},f){var g=c[h];n(g&&g.modify,(function(e){e in d&&(m[e]=!0)}))}for(var v in d)if(!(v in u))for(var b in p(v))if(b in u){m[v]=!0;break}for(var y in f=m)u[y]=!0}var w={getIds:function(){var e=[];return w.load((function(n){e.push(n)})),e},load:function(n,t){return function(n,t,o,a){var r=a?a.series:void 0,i=a?a.parallel:e,s={},c={};function l(e){if(e in s)return s[e];c[e]=!0;var a,u=[];for(var d in n(e))d in t&&u.push(d);if(0===u.length)a=o(e);else{var m=i(u.map((function(e){var n=l(e);return delete c[e],n})));r?a=r(m,(function(){return o(e)})):o(e)}return s[e]=a}for(var u in t)l(u);var d=[];for(var m in c)d.push(s[m]);return i(d)}(p,u,n,t)}};return w}}();e.exports=n},2703:(e,n,t)=>{"use strict";var o=t(414);function a(){}function r(){}r.resetWarningCache=a,e.exports=function(){function e(e,n,t,a,r,i){if(i!==o){var s=new Error("Calling PropTypes validators directly is not supported by the `prop-types` package. Use PropTypes.checkPropTypes() to call them. Read more at http://fb.me/use-check-prop-types");throw s.name="Invariant Violation",s}}function n(){return e}e.isRequired=e;var t={array:e,bigint:e,bool:e,func:e,number:e,object:e,string:e,symbol:e,any:e,arrayOf:n,element:e,elementType:e,instanceOf:n,node:e,objectOf:n,oneOf:n,oneOfType:n,shape:n,exact:n,checkPropTypes:r,resetWarningCache:a};return t.PropTypes=t,t}},5697:(e,n,t)=>{e.exports=t(2703)()},414:e=>{"use strict";e.exports="SECRET_DO_NOT_PASS_THIS_OR_YOU_WILL_BE_FIRED"},4448:(e,n,t)=>{"use strict";var o=t(7294),a=t(7418),r=t(3840);function i(e){for(var n="https://reactjs.org/docs/error-decoder.html?invariant="+e,t=1;t
+ + \ No newline at end of file diff --git a/core-functionality/canonical-transcripts/index.html b/core-functionality/canonical-transcripts/index.html index 2b70c371..24621e5f 100644 --- a/core-functionality/canonical-transcripts/index.html +++ b/core-functionality/canonical-transcripts/index.html @@ -6,13 +6,13 @@ Canonical Transcripts | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
- - +
Version: 3.25 (unreleased)

Canonical Transcripts

Overview

One of the more polarizing topics within annotation is the notion of canonical transcripts. Because of alternative splicing, we often have several transcripts for each gene. In the human genome, there are an average of 3.4 transcripts per gene (Tung, 2020). As scientists, we seem to have a need for identifying a representative example of a gene - even if there's no biological basis for the motivation.

Golden Helix Blog

A few years ago, the guys over at Golden Helix wrote an excellent post about the pitfalls and issues surrounding the identification of canonical transcripts: What’s in a Name: The Intricacies of Identifying Variants.

In Illumina Connected Annotations, we wanted to identify an algorithm for determining the canonical transcript and apply it consistently to all of our transcript data sources.

Known Algorithms

UCSC

UCSC publishes a list of canonical transcripts in its knownCanonical table which is available via the TableBrowser. Of the RefSeq data sources, it was the only one we could find that provided canonical transcripts:

The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.

If you were to implement this and compare it with the knownCanonical table, you would see a lot of exceptions to the rule.

Ensembl

The Ensembl glossary states:

The canonical transcript is used in the gene tree analysis in Ensembl and does not necessarily reflect the most biologically relevant transcript of a gene. For human, the canonical transcript for a gene is set according to the following hierarchy:

  1. Longest CCDS translation with no stop codons.
  2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons.
  3. If no (2), choose the longest translation with no stop codons.
  4. If no translation, choose the longest non-protein-coding transcript.

ACMG

From the ACMG Guidelines for the Interpretation of Sequence Variants:

A reference transcript for each gene should be used and provided in the report when describing coding variants. The transcript should represent either the longest known transcript and/or the most clinically relevant transcript.

ClinVar

From the ClinVar paper:

When there are multiple transcripts for a gene, ClinVar selects one HGVS expression to construct a preferred name. By default, this selection is based on the first reference standard transcript identified by the RefSeqGene/LRG (Locus Reference Genomic) collaboration.

Unified Approach

Our approach is almost identical to the one Golden Helix discussed in their article:

  1. If we're looking at RefSeq, only consider NM & NR transcripts as candidates for canonical transcripts.
  2. Sort the transcripts in the following order:
    1. Locus Reference Genomic (LRG) entries occur before non-LRG entries
    2. Descending CDS length
    3. Descending transcript length
    4. Ascending accession number
  3. Grab the first entry
+ + \ No newline at end of file diff --git a/core-functionality/gene-fusions/index.html b/core-functionality/gene-fusions/index.html index 49d41946..5410d2d3 100644 --- a/core-functionality/gene-fusions/index.html +++ b/core-functionality/gene-fusions/index.html @@ -6,14 +6,14 @@ Gene Fusion Detection | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 &amp; FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 &amp; FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. -If only unidirectional gene fusions are desired, only these two fusions can be detected. If enable-bidirectional-fusions is enabled, all four cases can be identified.

Interpreting translocation breakends

At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the VCF 4.2 specification.

REFALTMeaning
st[p[piece extending to the right of p is joined after t
st]p]reverse comp piece extending left of p is joined after t
s]p]tpiece extending to the left of p is joined before t
s[p[treverse comp piece extending right of p is joined before t

Variant Types

Specifically we can identify gene fusions from the following structural variant types:

  • deletions (<DEL>)
  • tandem_duplications (<DUP:TANDEM>)
  • inversions (<INV>)
  • translocation breakpoints (AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[)

Criteria

The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:

  1. After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if enable-bidirectional-fusions is not enabled. They can have the same or different orientations if enable-bidirectional-fusions is set.
  2. Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)
  3. Both transcripts must belong to different genes
  4. Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)

ETV6/RUNX1 Example

ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment.

VCF

Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND
chr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND
chr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND
chr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND

When you put these calls together, the resulting genomic rearrangement looks something like this:

JSON Output

The annotation for the first variant in the VCF looks like this:

{"positions":[
{
"chromosome": "12",
"position": 12026270,
"refAllele": "C",
"altAlleles": [
"[chr21:36420865[C"
],
"filters": [
"PASS"
],
"cytogeneticBand": "12p13.2",
"variants": [
{
"vid": "12-12026270-C-[chr21:36420865[C",
"chromosome": "12",
"begin": 12026270,
"end": 12026270,
"isStructuralVariant": true,
"refAllele": "C",
"altAllele": "[chr21:36420865[C",
"variantType": "translocation",
"transcripts": [
{
"transcript": "ENST00000396373.4",
"source": "Ensembl",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "ENSG00000139083",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "ENST00000437180.1",
"bioType": "mRNA",
"source": "Ensembl",
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000409227.1",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
},
{
"transcript": "ENST00000300305.3",
"bioType": "mRNA",
"source": "Ensembl",
"isCanonical": true,
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000300305.3",
"intron": 1,
"hgnc": "RUNX1",
"hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "ENSP00000379658.3"
},
{
"transcript": "NM_001987.5",
"source": "RefSeq",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "2120",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "NM_001754.5",
"bioType": "mRNA",
"source": "RefSeq",
"isCanonical": true,
"geneId": "861",
"proteinId": "NP_001745.2",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "NP_001978.1"
}
]
}
]
}
]}

FieldTypeNotes
transcriptstringtranscript ID
bioTypestringdescriptions of the biotypes from Ensembl
exonintexon that contained fusion breakpoint
intronintintron that contained fusion breakpoint
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
hgvsrstringHGVS RNA nomenclature

Gene Fusion Data Sources

To provide more context to our gene fusions, we provide the following gene fusion data sources:

Consequences

When a gene fusion is identified, we add the following Sequence Ontology consequence:

              "consequence": [
"transcript_variant",
"gene_fusion"
],
  • If both transcripts have the same orientation, we label it as unidirectional_gene_fusion, if they have different orientations, we label it as bidirectional_gene_fusion
  • If both unidirectional and bidirectional ones are detected, we label it as gene_fusion.

Gene Fusions Section

The geneFusions section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ENST00000396373.4, there 7 other Ensembl transcripts that would produce a gene fusion. For NM_001987.4, there was only one transcript (NM_001754.4) that produce a gene fusion.

For each originating transcript, we report the following for each partner transcript:

  • transcript ID
  • gene ID
  • HGNC gene symbol
  • transcript bio type (e.g. protein_coding)
  • intron or exon number containing the breakpoint
  • HGVS RNA notation
  • gene fusion directionality
tip

Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see HGVS SVD-WG007).

          "geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],

The HGVS RNA notation above indicates that the gene fusion starts with NM_001754.4 (RUNX1) until CDS position 58 and continues with NM_001987.4 (ETV6). 1009+3367 indicates that the fusion occurred 3367 bp within intron 2.

- - +
Version: 3.25 (unreleased)

Gene Fusion Detection

Overview

Gene fusions often result from large genomic rearrangements such as structural variants. While WGS secondary analysis pipelines typically contain alignment and variant calling stages, very few of them contain dedicated gene fusion callers. When they are included, they are usually associated with RNA-Seq pipelines where gene fusions can be readily observed.

Since gene fusions are frequently observed in cancer and since many sequencing experiments do not include paired RNA-Seq data, we have added gene fusion detection and annotation to Illumina Connected Annotations.

The rich diversity in gene fusion architectures and their likely mechanisms can be seen below:

Publication

Kumar-Sinha, C., Kalyana-Sundaram, S. & Chinnaiyan, A.M. Landscape of gene fusions in epithelial cancers: seq and ye shall find. Genome Med 7, 129 (2015)

Approach

Illumina Connected Annotations uses structural variant calls to evaluate if they form either putative intra-chromosomal or inter-chromosomal gene fusions. Let's consider two transcripts, NM_014206.3 (TMEM258) and NM_013402.4 (FADS1). Both of these genes are on the reverse strand in the genome. The vertical bar indicates the breakpoint where these transcripts are fused:

TMEM258 &amp; FADS1 transcripts

The above explains where the transcripts are fused together, but it doesn't explain in which orientation. By using the directionality encoded in the translocation breakend, we can rearrange these two transcripts in four ways:

TMEM258 &amp; FADS1 gene fusions

Only two of the combinations yields a fusion containing both the transcription start site (TSS) and the stop codon. In one case, we can even detect an in-frame gene fusion. +If only unidirectional gene fusions are desired, only these two fusions can be detected. If enable-bidirectional-fusions is enabled, all four cases can be identified.

Interpreting translocation breakends

At first glance, translocation breakends are a bit daunting. However, once you understand how they work, they're actually quite simple. For more information, we recommend reading section 5.4 in the VCF 4.2 specification.

REFALTMeaning
st[p[piece extending to the right of p is joined after t
st]p]reverse comp piece extending left of p is joined after t
s]p]tpiece extending to the left of p is joined before t
s[p[treverse comp piece extending right of p is joined before t

Variant Types

Specifically we can identify gene fusions from the following structural variant types:

  • deletions (<DEL>)
  • tandem_duplications (<DUP:TANDEM>)
  • inversions (<INV>)
  • translocation breakpoints (AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[)

Criteria

The following criteria must be met for Illumina Connected Annotations to identify a gene fusion:

  1. After accounting for gene orientation and genomic rearrangements, both transcripts must have the same orientation if enable-bidirectional-fusions is not enabled. They can have the same or different orientations if enable-bidirectional-fusions is set.
  2. Both transcripts must be from the same transcript source (i.e. we won't mix and match between RefSeq and Ensembl transcripts)
  3. Both transcripts must belong to different genes
  4. Both transcripts cannot have a coding region that already overlaps without the variant (i.e. in cases where two genes naturally overlap, we don't want to call a gene fusion)

ETV6/RUNX1 Example

ETV6/RUNX1 is the most common gene fusion in childhood B-cell precursor acute lymphoblastic leukemia (ALL). Samples with this translocation are associated with a good prognosis and excellent response to treatment.

VCF

Here's a simplified representation of the translocation breakends called by the Manta structural variant caller:

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr12 12026270 . C [chr21:36420865[C . PASS SVTYPE=BND
chr12 12026305 . A A]chr21:36420571] . PASS SVTYPE=BND
chr21 36420571 . C C]chr12:12026305] . PASS SVTYPE=BND
chr21 36420865 . C [chr12:12026270[C . PASS SVTYPE=BND

When you put these calls together, the resulting genomic rearrangement looks something like this:

JSON Output

The annotation for the first variant in the VCF looks like this:

{"positions":[
{
"chromosome": "12",
"position": 12026270,
"refAllele": "C",
"altAlleles": [
"[chr21:36420865[C"
],
"filters": [
"PASS"
],
"cytogeneticBand": "12p13.2",
"variants": [
{
"vid": "12-12026270-C-[chr21:36420865[C",
"chromosome": "12",
"begin": 12026270,
"end": 12026270,
"isStructuralVariant": true,
"refAllele": "C",
"altAllele": "[chr21:36420865[C",
"variantType": "translocation",
"transcripts": [
{
"transcript": "ENST00000396373.4",
"source": "Ensembl",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "ENSG00000139083",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "ENST00000437180.1",
"bioType": "mRNA",
"source": "Ensembl",
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000409227.1",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "ENST00000437180.1(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
},
{
"transcript": "ENST00000300305.3",
"bioType": "mRNA",
"source": "Ensembl",
"isCanonical": true,
"geneId": "ENSG00000159216",
"proteinId": "ENSP00000300305.3",
"intron": 1,
"hgnc": "RUNX1",
"hgvsr": "ENST00000300305.3(RUNX1):r.?_58+274::ENST00000396373.4(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "ENSP00000379658.3"
},
{
"transcript": "NM_001987.5",
"source": "RefSeq",
"bioType": "mRNA",
"introns": "5/7",
"geneId": "2120",
"hgnc": "ETV6",
"consequence": [
"transcript_variant",
"unidirectional_gene_fusion"
],
"impact": "modifier",
"geneFusions": [
{
"transcript": "NM_001754.5",
"bioType": "mRNA",
"source": "RefSeq",
"isCanonical": true,
"geneId": "861",
"proteinId": "NP_001745.2",
"intron": 2,
"hgnc": "RUNX1",
"hgvsr": "NM_001754.5(RUNX1):r.?_58+274::NM_001987.5(ETV6):r.1009+3367_?",
"directionality": "unidirectional"
}
],
"isCanonical": true,
"proteinId": "NP_001978.1"
}
]
}
]
}
]}

FieldTypeNotes
transcriptstringtranscript ID
bioTypestringdescriptions of the biotypes from Ensembl
exonintexon that contained fusion breakpoint
intronintintron that contained fusion breakpoint
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
hgvsrstringHGVS RNA nomenclature

Gene Fusion Data Sources

To provide more context to our gene fusions, we provide the following gene fusion data sources:

Consequences

When a gene fusion is identified, we add the following Sequence Ontology consequence:

              "consequence": [
"transcript_variant",
"gene_fusion"
],
  • If both transcripts have the same orientation, we label it as unidirectional_gene_fusion, if they have different orientations, we label it as bidirectional_gene_fusion
  • If both unidirectional and bidirectional ones are detected, we label it as gene_fusion.

Gene Fusions Section

The geneFusions section is contained within the object of the originating transcript. It will contain all the pairwise gene fusions that obey the criteria outline above. In the case of ENST00000396373.4, there 7 other Ensembl transcripts that would produce a gene fusion. For NM_001987.4, there was only one transcript (NM_001754.4) that produce a gene fusion.

For each originating transcript, we report the following for each partner transcript:

  • transcript ID
  • gene ID
  • HGNC gene symbol
  • transcript bio type (e.g. protein_coding)
  • intron or exon number containing the breakpoint
  • HGVS RNA notation
  • gene fusion directionality
tip

Before Illumina Connected Annotations 3.15, we provided HGVS coding notation. However, HGVS r. notation is more appropriate for these types fusion splicing events (see HGVS SVD-WG007).

          "geneFusions": [
{
"transcript": "NM_001754.4",
"bioType": "protein_coding",
"intron": 2,
"geneId": "861",
"hgnc": "RUNX1",
"hgvsr": "NM_001754.4(RUNX1):r.?_58+274::NM_001987.4(ETV6):r.1009+3367_?",
"directionality":"uniDirectional"
}
],

The HGVS RNA notation above indicates that the gene fusion starts with NM_001754.4 (RUNX1) until CDS position 58 and continues with NM_001987.4 (ETV6). 1009+3367 indicates that the fusion occurred 3367 bp within intron 2.

+ + \ No newline at end of file diff --git a/core-functionality/iscn-notation/index.html b/core-functionality/iscn-notation/index.html new file mode 100644 index 00000000..745e896d --- /dev/null +++ b/core-functionality/iscn-notation/index.html @@ -0,0 +1,24 @@ + + + + + + + +ISCN Notation | IlluminaConnectedAnnotations + + + + +
+
Version: 3.25 (unreleased)

ISCN Notation

Introduction

The International System for Human Cytogenetic Nomenclature (ISCN) is a standardized system used +to describe chromosomal abnormalities. It is a standardized system developed to describe the banding pattern of human +chromosomes as well as any structural variations. +ISCN is used by geneticists and researchers to ensure clarity and uniformity when reporting chromosomal abnormalities.

Key Components of ISCN Notation:

  • Chromosome Number: Identifies the chromosome.
  • Arm: Chromosome arms are labeled "p" (short arm) and "q" (long arm).
  • Banding Pattern: Each arm is divided into regions, bands, and sub-bands that are numbered starting from the centromere (central part of the chromosome).

Overview

The provided ISCN notation algorithm processes chromosomal variants and generates ISCN notation by following these steps:

  1. Identify Variant Type: +The algorithm recognizes several types of chromosomal variants such as duplications, deletions, copy number gains, and copy number losses.

  2. Locate Cytogenetic Bands: +Using the start and end positions of the variant, the algorithm identifies the corresponding cytogenetic bands on the chromosome.

  3. Generate Notation: +Constructs the ISCN notation string using the variant type, chromosome number, and identified cytogenetic bands.

Supported Variant Types

The algorithm supports the following variant types:

  • Deletion (del)
  • Duplication (dup)
  • Copy Number Gain (dup)
  • Copy Number Loss (del)

Example

For a deletion on chromosome 8 from position 19200001 to 135400001, the algorithm would:

  1. Recognize the variant type as a deletion.
  2. Identify the start band as p21.3 and the end band as q24.23.
  3. Generate the ISCN notation: del(8)(p21.3q24.23).

More examples:

ChromosomeStart PositionEnd PositionVariant TypeISCN Notation
8119200001deletiondel(8)(p21.3)
8119200001duplicationdup(8)(p21.3)
819200001135400001deletiondel(8)(p21.3q24.23)
819200001135400001duplicationdup(8)(p21.3q24.23)
8127300001131500000duplicationdup(8)(q24.21q24.22)
8127300001131500000copy number gaindup(8)(q24.21q24.22)
8128746677128749160duplicationdup(8)(q24.21q24.21)
8128746677128749160copy number gaindup(8)(q24.21q24.21)
8135400001138900001duplicationdup(8)(q24.23q24.3)
8135400001146364022deletiondel(8)(q24.23)
8135400001145138635duplicationdup(8)(q24.23q24.3)
8135400001138900001copy number lossdel(8)(q24.23q24.3)
8135400001146364022duplicationdup(8)(q24.23)
X86200001103700000copy number lossdel(X)(q21.31q22.2)
X86200001103700000deletiondel(X)(q21.31q22.2)

References

+ + + + \ No newline at end of file diff --git a/core-functionality/junction-preserving/index.html b/core-functionality/junction-preserving/index.html index a669cf9e..181a2d20 100644 --- a/core-functionality/junction-preserving/index.html +++ b/core-functionality/junction-preserving/index.html @@ -6,13 +6,13 @@ Junction Preserving Annotation | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Junction Preserving Annotation

Background

When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant.

Deletion at exon boundary

Note that:

  • When right-aligned the variant starts at the first base of the exon (as pictured).
  • When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.

From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an inframe_deletion not a splice_acceptor_variant as splice acceptor sequence AG is unaffected.

When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction.

Implementation

By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported.

note

This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant.

- - +
Version: 3.25 (unreleased)

Junction Preserving Annotation

Background

When a variant can be moved (due to alignment) across junctions (e.g. start, stop or splice site), the annotation may vary depending on which exact alignment was used. For example, a left-aligned deletion that effects the splice acceptor site, upon right-alignment, may become an exon variant.

Deletion at exon boundary

Note that:

  • When right-aligned the variant starts at the first base of the exon (as pictured).
  • When left-aligned the variant can be shifted two base pairs and starts at a splice acceptor site.

From the point of view of the translation mechinary, the important question is whether the sequence that identifies a junction is preserved, regardless of the variant position. In the case of the deletion above, we believe that the variant is more accurately characterized as an inframe_deletion not a splice_acceptor_variant as splice acceptor sequence AG is unaffected.

When faced with such variants, we will assign junction disrupting consequnces only if the variant cannot be shifted out of the junction.

Implementation

By default and convention, the left-aligned variant is annotated. If the variant overlaps a junction (as judged by consequences), it is right-aligned and annotated. If both alignment produces junction disruption, the left-aligned annotation is reported. If however, only one of the alignment causes junction disruption but not the other, the non-junction-disrupting annotation is reported.

note

This only effects transcript annotations. Supplementary annotations are reported on the left-aligned variant and HGVS notations are calculated on right-aligned variant.

+ + \ No newline at end of file diff --git a/core-functionality/transcript-consequence-impacts/index.html b/core-functionality/transcript-consequence-impacts/index.html index 70ca9edb..f2bbeb1d 100644 --- a/core-functionality/transcript-consequence-impacts/index.html +++ b/core-functionality/transcript-consequence-impacts/index.html @@ -6,14 +6,14 @@ Transcript Consequence Impact | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. -However, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided.

Example Transcript

The key impact for each transcript gives the impact rating for the consequence.

{
"variants": [
{
"vid": "1-1623412-T-C",
"chromosome": "1",
"begin": 1623412,
"end": 1623412,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.1623412T>C",
"transcripts": [
{
"transcript": "ENST00000479659.5",
"source": "Ensembl",
"bioType": "lncRNA",
"introns": "2/18",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"intron_variant",
"non_coding_transcript_variant"
],
"impact": "modifier",
"hgvsc": "ENST00000479659.5:n.288-19T>C"
},
{
"transcript": "ENST00000489635.5",
"source": "VEP",
"bioType": "mRNA",
"codons": "aTg/aCg",
"aminoAcids": "M/T",
"cdnaPos": "269",
"cdsPos": "134",
"exons": "3/20",
"proteinPos": "45",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"missense_variant"
],
"impact": "moderate",
"hgvsc": "ENST00000489635.5:c.134T>C",
"hgvsp": "ENSP00000426007.1:p.(Met45Thr)",
"proteinId": "ENSP00000426007.1"
}
]
}
]
}
- - +
Version: 3.25 (unreleased)

Transcript Consequence Impact

Overview

Illumina Connected Annotations provides transcript consequence impacts from SnpEff.

Following definitions are used for the impact ratings as obtained from SnpEff.

ImpactDefinition
highThe variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
moderateA non-disruptive variant that might change protein effectiveness.
lowAssumed to be mostly harmless or unlikely to change protein behavior.
modifierUsually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

Sources

Not all consequences are rated by SnpEff, therefore Illumina Connected Annotations combines the ratings from SnpEff with those from VEP.

  1. SnpEff Documentation and Codebase
  2. VEP Documentation

Consequence Impacts

Following table gives the combined rating for all consequences recognized by Illumina Connected Annotations.

ConsequenceSnpEff ImpactVEP ImpactIllumina Connected Annotations ImpactComment
bidirectional_gene_fusionhighhighSnpEff
coding_sequence_variantlow, modifiermodifiermodifierBased on CDS
copy_number_changemodifier
copy_number_decreasemodifier
copy_number_increasemodifier
downstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
feature_elongationmodifierhighhighVEP
feature_truncationhighhighVEP
five_prime_duplicated_transcriptmodifier
five_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
frameshift_varianthighhighhighSnpEff + VEP
gene_fusionhighhighSnpEff
incomplete_terminal_codon_variantlowlowVEP
inframe_deletionmoderatemoderatemoderateSnpEff + VEP
inframe_insertionmoderatemoderatemoderateSnpEff + VEP
intron_variantmodifiermodifiermodifierSnpEff + VEP
mature_miRNA_variantmodifiermodifierVEP
missense_variantmoderatemoderatemoderateSnpEff + VEP
NMD_transcript_variantmodifiermodifierVEP
non_coding_transcript_exon_variantmodifiermodifiermodifierSnpEff + VEP
non_coding_transcript_variantmodifiermodifiermodifierSnpEff + VEP
protein_altering_variantmoderatemoderateVEP
regulatory_region_ablationmodifiermodifierVEP
regulatory_region_amplificationmodifiermodifierVEP
regulatory_region_variantmodifiermodifiermodifierSnpEff + VEP
short_tandem_repeat_changemodifier
short_tandem_repeat_contractionmodifier
short_tandem_repeat_expansionmodifier
splice_acceptor_varianthighhighhighSnpEff + VEP
splice_donor_varianthighhighhighSnpEff + VEP
splice_region_variantmoderate, lowlowlowBased on SPLICE_SITE_REGION in SnpEff
start_losthighhighhighSnpEff + VEP
start_retained_variantlowlowlowSnpEff + VEP
stop_gainedhighhighhighSnpEff + VEP
stop_losthighhighhighSnpEff + VEP
stop_retained_variantlowlowlowSnpEff + VEP
synonymous_variantlowlowlowSnpEff + VEP
three_prime_duplicated_transcriptmodifier
three_prime_UTR_variantmodifiermodifiermodifierSnpEff + VEP
transcript_ablationhighhighhighSnpEff + VEP
transcript_amplificationhighhighVEP
transcript_variantmodifiermodifierSnpEff
unidirectional_gene_fusionhighhighSnpEff
upstream_gene_variantmodifiermodifiermodifierSnpEff + VEP
Note:
  1. For transcripts with multiple consequences, the most severe impact rating is chosen.
  2. In case of consequences that do not have any impact rating from SnpEff or VEP, Illumina Connected Annotations provides modifier.

Known Issues

Known Issues

The consequence splice_polypyrimidine_tract_variant, is rated as low by VEP. +However, this consequence is not annotated by Illumina Connected Annotations, therefore the impact will also not be provided.

Example Transcript

The key impact for each transcript gives the impact rating for the consequence.

{
"variants": [
{
"vid": "1-1623412-T-C",
"chromosome": "1",
"begin": 1623412,
"end": 1623412,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.1623412T>C",
"transcripts": [
{
"transcript": "ENST00000479659.5",
"source": "Ensembl",
"bioType": "lncRNA",
"introns": "2/18",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"intron_variant",
"non_coding_transcript_variant"
],
"impact": "modifier",
"hgvsc": "ENST00000479659.5:n.288-19T>C"
},
{
"transcript": "ENST00000489635.5",
"source": "VEP",
"bioType": "mRNA",
"codons": "aTg/aCg",
"aminoAcids": "M/T",
"cdnaPos": "269",
"cdsPos": "134",
"exons": "3/20",
"proteinPos": "45",
"geneId": "ENSG00000197530",
"hgnc": "MIB2",
"consequence": [
"missense_variant"
],
"impact": "moderate",
"hgvsc": "ENST00000489635.5:c.134T>C",
"hgvsp": "ENSP00000426007.1:p.(Met45Thr)",
"proteinId": "ENSP00000426007.1"
}
]
}
]
}
+ + \ No newline at end of file diff --git a/core-functionality/variant-ids/index.html b/core-functionality/variant-ids/index.html index 2b84afda..6e1a650a 100644 --- a/core-functionality/variant-ids/index.html +++ b/core-functionality/variant-ids/index.html @@ -6,13 +6,13 @@ Variant IDs | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
- - +
Version: 3.25 (unreleased)

Variant IDs

Overview

Many downstream tools use a variant identifier to store annotation results. We've standardized on using variant identifiers (VIDs) that originated from the notation used by the Broad Institute.

The Broad VID scheme is not only simple, but it has the advantage that a user could create a bare bones VCF entry from the information captured in the identifier. One of the limitations of the Broad VID scheme is that it does not define how to handle structural variants. Our VID scheme attempts to fill that gap.

Conventions
  • all chromosomes use Ensembl style notation (i.e. 22 instead of chr22)
  • for a reference variant (i.e. no alt allele), replace the period (.) with the reference base
  • padding bases are used, neither the reference nor alternate allele can be empty
  • some large variant callers lazily output N for the reference allele. If this is the case, replace it with the true reference base

Small Variants

VCF Examples

chr1    66507   .   T   A   184.45  PASS    .
chr1 66521 . T TATATA 144.53 PASS .
chr1 66572 . GTA G,GTACTATATATTATA 45.45 PASS .

Format

chromosomepositionreference allelealternate allele

VID Examples

  • 1-66507-T-A
  • 1-66521-T-TATATA
  • 1-66572-GTA-G
  • 1-66572-G-GTACTATATATTA

Translocation Breakends

VCF Example

chr1    2617277 .   A   AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[  .   PASS    SVTYPE=BND

Format

chromosomepositionreference allelealternate allele

VID Example

  • 1-2617277-A-AAAAAAAAAAAAAAAAAATTAGTCAGGCAC[chr3:153444911[

All Other Structural Variants

VCF Examples

chr1    1000    .   G   <ROH>   .   PASS    END=3001000;SVTYPE=ROH
chr1 1350082 . G <DEL> . PASS END=1351320;SVTYPE=DEL
chr1 1477854 . C <DUP:TANDEM> . PASS END=1477984;SVTYPE=DUP
chr1 1477968 . T <INS> . PASS END=1477968;SVTYPE=INS
chr1 1715898 . N <DUP> . PASS SVTYPE=CNV;END=1750149
chr1 2650426 . N <DEL> . PASS SVTYPE=CNV;END=2653074
chr2 321682 . T <INV> . PASS SVTYPE=INV;END=421681
chr20 2633403 . G <STR2> . PASS END=2633421

Format

chromosomepositionend positionreference allelealternate alleleSVTYPE

VID Examples

  • 1-1000-3001000-G-<ROH>-ROH
  • 1-1350082-1351320-G-<DEL>-DEL
  • 1-1477854-1477984-C-<DUP:TANDEM>-DUP
  • 1-1477968-1477968-T-<INS>-INS
  • 1-1715898-1750149-A-<DUP>-CNV (replace the N with A)
  • 1-2650426-2653074-N-<DEL>-CNV (keep the N)
  • 2-321682-421681-T-<INV>-INV
  • 20-2633403-2633421-G-<STR2>-STR
+ + \ No newline at end of file diff --git a/data-sources/1000Genomes-snv-json/index.html b/data-sources/1000Genomes-snv-json/index.html index ea0ef836..4fb5fb3d 100644 --- a/data-sources/1000Genomes-snv-json/index.html +++ b/data-sources/1000Genomes-snv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-snv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
- - +
Version: 3.25 (unreleased)

1000Genomes-snv-json

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.
+ + \ No newline at end of file diff --git a/data-sources/1000Genomes-sv-json/index.html b/data-sources/1000Genomes-sv-json/index.html index 7ff3b864..edc8099c 100644 --- a/data-sources/1000Genomes-sv-json/index.html +++ b/data-sources/1000Genomes-sv-json/index.html @@ -6,13 +6,13 @@ 1000Genomes-sv-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - +
Version: 3.25 (unreleased)

1000Genomes-sv-json

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
+ + \ No newline at end of file diff --git a/data-sources/1000Genomes/index.html b/data-sources/1000Genomes/index.html index bca924d7..ad994e8d 100644 --- a/data-sources/1000Genomes/index.html +++ b/data-sources/1000Genomes/index.html @@ -6,15 +6,15 @@ 1000 Genomes | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 +

Version: 3.25 (unreleased)

1000 Genomes

Overview

The goal of the 1000 Genomes Project was to find most genetic variants with frequencies of at least 1% in the populations studied. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.

Publication

Sudmant, P., Rausch, T., Gardner, E. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). https://doi.org/10.1038/nature15394

Populations

Small Variants

VCF File Parsing

The original VCF files come with allele frequency fields (e.g. ALL_AF, AMR_AF) but we recompute them using allele counts and allele numbers in order to get 6 digit precision. The allele counts and allele numbers (e.g. AMR_AC, AMR_AN) are not expressed in the INFO field. Instead the genotypes need to be parsed to compute that information. Our team converted the original data to VCF entries with allele counts and allele numbers like the following.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 15274 rs62636497 A G,T 100 PASS AC=1739,3210;AF=0.347244,0.640974;AN=5008;NS=2504;DP=23255;EAS_AF=0.4812,0.5188;AMR_AF=0.2752,0.7205;AFR_AF=0.323,0.6369;EUR_AF=0.2922,0.7078;SAS_AF=0.3497,0.6472;AA=g|||;VT=SNP;MULTI_ALLELIC;EAS_AN=1008;EAS_AC=485,523;EUR_AN=1006;EUR_AC=294,712;AFR_AN=1322;AFR_AC=427,842;AMR_AN=694;AMR_AC=191,500;SAS_AN=978;SAS_AC=342,633

The ancestral allele, if it exists, is the first value in the pipe separated AA fields (the Indel specific REF, ALT, IndelType fields are ignored).

We parse the VCF file and extract the following fields from INFO:

  • AA
  • AC
  • AN
  • EAS_AN
  • AMR_AN
  • AFR_AN
  • EUR_AN
  • SAS_AN
  • EAS_AC
  • AMR_AC
  • AFR_AC
  • EUR_AC
  • SAS_AC

Conflict Resolution

We have observed conflicting allele frequency information in the source. Take the following example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 20505705 rs35377696 C CTCTG,CTG,CTGTG 100 PASS AC=46,1513,152;AF=0.0091853,0.302117,0.0303514;
1 20505705 rs35377696 C CTG 100 PASS AC=4;AF=0.000798722;

That is, the variant 1-20505705-C-CTG has conflicting entries. To get an idea of how frequently we observe this, here is a table summarizing ChrX and all chromosomes. Note that almost all such entries are found in ChrX.

Chromosome# of alleles# of conflicting allelespercentage
chrX83480027330.33%
Total2141309827430.013%

Currently, we removed the allele frequency of the conflicting allele (i.e., insertion TG in the example) but keep allele frequencies of all other alleles in the VCF line.

Potential Alternate Solutions

  • Remove all alleles that are contained in the vcf lines which have conflicting allele. (Recommended by 1000 genome group Holly Zheng-Bradley, 7/29/2015)
  • Recalculate the allele frequency for the conflicting allele.
  • Pick the allele frequency that has the highest data support.

Download URL

GRCh37 GRCh38

JSON Output

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

Structural Variants

VCF File Parsing

The VCF files contain entries like the following:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A <CN0>,<CN2>,<CN3>,<CN4> 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4

Please note that, CNVs are allele-specific. For example, HG00096 is effectively copy number 4, which would be a net gain on chr22.

1000 Genomes contains 5 types of structural variants:

  • CNV
  • DEL
  • DUP
  • INS
  • INV

Since data of 1000 genomes is provided in VCF format, we assume that the coordinates follow the vcf format, i.e., there is a padding base for symbolic alleles. So all the interval can be interpreted as [BEGIN+1, END]. Similarly, for all other variant types except insertion, END is far larger than BEGIN. The distribution of BEGIN and END for insertions is summarized below.

Insertion issues

  • END = BEGIN for 6/165
  • END = BEGIN+2 for 93/165
  • END = BEGIN+3 for 11/165
  • END = BEGIN+4 for 11/165
  • END – BEGIN range from 5 to 1156 for others.

Converting VCF svTypes to SO sequence alterations

The svType will be captured in our JSON file under the sequenceAlteration key. Here's the translation we'll use according to svType in 1000 Genomes.

svTypeAlternative Alleles contain <CN*>sequenceAlteration
ALUFALSEmobile_element_insertion
DUPTRUEcopy_number_gain
CNVTRUEcopy_number_gain (observed_gains >0 and observed_losses =0)
copy_number_loss (observed_gains = 0 and observed_losses > 0)
copy_number_variation (otherwise)
DELTRUEcopy_number_loss
LINE1FALSEmobile_element_insertion
SVAFALSEmobile_element_insertion
INVFALSEinversion
INSFALSEinsertion

Exceptions

We discard structural variants without END

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
21 9495848 esv3646347 A <INS:ME:LINE1> 100 PASS AC=1543;AF=0.308107;AN=5008;CS=L1_umary;MEINFO=LINE1,5669,6005,+;NS=2504;SVLEN=336;SVTYPE=LINE1;TSD=null;DP=20015;EAS_AF=0.3125;AMR_AF=0.2911;AFR_AF=0.3026;EUR_AF=0.2922;SAS_AF=0.3395;VT=SV GT 0|0 1|1 1|0 0|1 1|0 1|0 0|0

CNVs in chrY

  • No other types of structural variants exist in chrY
  • Since copy number is provided in genotype field, we directly parse the copy number from "CN" field.
  • For most CNVs in chrY, the reference copy number is 1, but the refence number for CNVs in segmental duplication sites is 2 (<CN2> in the 2nd example). All segmental duplication calls have identifiers starting with GS_SD_M2.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00101 HG00103 HG00105 HG00107 HG00108
Y 2888555 CNV_Y_2888555_3014661 T <CN2> 100 PASS AC=1;AF=0.000817661;AN=1223;END=3014661;NS=1233;SVTYPE=CNV;AMR_AF=0.0000;AFR_AF=0.0000;EUR_AF=0.0000;SAS_AF=0.0019;EAS_AF=0.0000;VT=SV GT:CN:CNL:CNP:CNQ:GP:GQ:PL 0:1:-1000,0,-58.45:-1000,0,-61.55:99:0,-61.55:99:0,585 0:1:-296.36,0,-16.6:-300.46,0,-19.7:99:0,-19.7:99:0,166 0:1:-1000,0,-39.44:-1000,0,-42.54:99:0,-42.54:99:0,394
Y 6128381 GS_SD_M2_Y_6128381_6230094_Y_9650284_9752225 C <CN1>,<CN3> 100 PASS AC=4,2;AF=0.00327065,0.00163532;AN=1223;END=6230094;NS=1233;SVTYPE=CNV;AMR_AF=0.0029,0.0029;AFR_AF=0.0016,0.0016;EUR_AF=0.0000,0.0000;SAS_AF=0.0038,0.0000;EAS_AF=0.0000,0.0000;VT=SV;EX_TARGET GT:CN:CNL:CNP:CNQ:GP:GQ 0:2:-1000,-138.78,0,-38.53:-1000,-141.27,0,-41.33:99:0,-141.27,-41.33:99 0:2:-1000,-53.32,0,-17.85:-1000,-55.81,0,-20.64:99:0,-55.81,-20.64:99 0:2:-1000,-71.83,0,-32.5:-1000,-74.32,0,-35.29:99:0,-74.32,-35.29:99 0:2:-1000,-60.96,0,-20.29:-1000,-63.45,0,-23.08:99:0,-63.45,-23.08:99 0:2:-1000,-77.6,0,-31.45:-1000,-80.09,0,-34.24:99:0,-80.09,-34.24:99

JSON Output

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.
- - + + \ No newline at end of file diff --git a/data-sources/amino-acid-conservation-json/index.html b/data-sources/amino-acid-conservation-json/index.html index a25f4f79..68dd576f 100644 --- a/data-sources/amino-acid-conservation-json/index.html +++ b/data-sources/amino-acid-conservation-json/index.html @@ -6,13 +6,13 @@ amino-acid-conservation-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - +
Version: 3.25 (unreleased)

amino-acid-conservation-json

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
+ + \ No newline at end of file diff --git a/data-sources/amino-acid-conservation/index.html b/data-sources/amino-acid-conservation/index.html index ca2bcbcb..fd946145 100644 --- a/data-sources/amino-acid-conservation/index.html +++ b/data-sources/amino-acid-conservation/index.html @@ -6,14 +6,14 @@ Amino Acid Conservation | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. +

Version: 3.25 (unreleased)

Amino Acid Conservation

Overview

Amino acid conservation scores are obtained from multiple alignments of vertebrate exomes to the human ones. The score indicate the frequency with which a particular AA is observed in Humans.

Publication

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

FASTA File

The exon alignments are provided in FASTA files as follows:

>ENST00000641515.2_hg38_1_2 3 0 0 chr1:65565-65573+
MKK
>ENST00000641515.2_panTro4_1_2 3 0 0 chrUn_GL393541:146907-146915+
MKK
>ENST00000641515.2_gorGor3_1_2 3 0 0
---
>ENST00000641515.2_ponAbe2_1_2 3 0 0 chr15:99141417-99141425-
MKK
>ENST00000641515.2_hg38_2_2 324 0 0 chr1:69037-70008+
VTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKFZ
>ENST00000641515.2_panTro4_2_2 324 0 0 chrUn_GL393541:151333-152303+

Parsing FASTA

For each Ensembl transcript, we will need to aggregate all the exons together for each of the 100 species. From there, we should get a full alignment that can be used to determine conservation. For example, for ENST00000641515.2 we have:

Human (hg38) MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Chimp MKKVTAEAISWNESTSETNNSMVTEFIFLGLSDSQELQTFL-MLFFVFYGGIVFGNLLIVRIVVSDSHLHSPMYFLLANLSLIDLSLCSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gorilla ----------------------------------------------------------------------------------------------------------------------
Orangutan MKKVTAEAISWNESTSKTNNSVVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVIIVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLL
Gibbon ----------------------------------------------------------------------------------------------------------------------
Rhesus MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVVDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL
Macaque MKKVTEAAISWNESTSETNNSIVTEFIFLGLSDSQELQIFLFVLFLVFYGGIVFGNLLIVITVVSDSHLHSPMYLLLANLSVIDLSLSSVTAPKMITDFFSQRKAISFKGCLVQIFLL

If we look at position 6, we see that humans have an Alanine (A) residue. This residue is shared by Chimp and Orangutan. However, Rhesus and Macaque have a Glutamic acid (E) residue at that position. Moreover, Gorilla and Gibbon don't even have data for that transcript. For position 6, we would say that we have 43% conservation (3/7) since three organisms share the same residue as humans.

Assigning scores to Illumina Connected Annotations transcripts

The source FASTA file comes with Ensembl/UCSC transcript ids of the transcripts used for alignments. The Illumina Connected Annotations cache has RefSeq and Ensembl transcripts and our first attempt was to map the given Ensembl/UCSC ids to their equivalent RefSeq/Ensembl ids. This attempt was unsuccessful since UCSC Table Browser provided mapping without version numbers. So we proceeded as follows:

  • Take proteins which have a unique mapping (and hence one set of conservation scores). For ones that mapped to both ChrX and ChrY, we accepted the one from ChrX.
  • A Illumina Connected Annotations transcript having an exact peptide sequence match with a uniquely aligned protein is assigned the corresponding conservation scores.

Unfortunately this left us with a very small number of transcripts having conservation scores.

GRCh37

  • Source FASTA contained 41957 protein alignments.
  • 38165 proteins had unique scores.
  • 88 aligned proteins existed in Illumina Connected Annotations cache.
  • 118 transcripts had conservation scores.

GRCh38

  • Source FASTA contained 110024 protein alignments.
  • 88961 proteins had unique scores.
  • 11688 aligned proteins existed in Illumina Connected Annotations cache.
  • 12098 transcripts had conservation scores.

Download URL

GRCh37: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownGene.exonAA.fa.gz

GRCh38: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/alignments/knownGene.exonAA.fa.gz

JSON Output

Conservation scores are reported in the transcript section. One score is reported for each alt allele

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00
- - + + \ No newline at end of file diff --git a/data-sources/cancer-hotspots/index.html b/data-sources/cancer-hotspots/index.html index 31d1bf2a..8bd8c574 100644 --- a/data-sources/cancer-hotspots/index.html +++ b/data-sources/cancer-hotspots/index.html @@ -6,14 +6,14 @@ Cancer Hotspots | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. +

Version: 3.25 (unreleased)

Cancer Hotspots

Overview

Cancer Hotspots, a resource for statistically significant mutations in cancer. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

Publication

Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, Chakravarty D, Phillips S, Kandoth C, Penson A, Gorelick A, Shamu T, Patel S, Harris C, Gao J, Sumer SO, Kundra R, Razavi P, Li BT, Reales DN, Socci ND, Jayakumaran G, Zehir A, Benayed R, Arcila ME, Chandarlapaty S, Ladanyi M, Schultz N, Baselga J, Berger MF, Rosen N, Solit DB, Hyman DM, Taylor BS. Accelerating Discovery of Functional Mutant Alleles in Cancer. Cancer Discov. 2018 Feb;8(2):174-183. doi: 10.1158/2159-8290.CD-17-0321. Epub 2017 Dec 15. PMID: 29247016; PMCID: PMC5809279.

Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, Schultz N, Taylor BS. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30. PMID: 26619011; PMCID: PMC4744099.

Data extraction

Illumina Connected Annotations currently parses SNV and indel tabs from hotspots_v2.xls file to extract the relevant content.

Example

SNV

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        Variant_Amino_Acid   Codon_Change     Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      Total_Samples   Analysis_Type   qvalue  tm      qvalue_pancanIs_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        ref     qvaluect     ct       Samples
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 R:204 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:88|thyroid:54|blood:15|bowel:8|testis:5|biliarytract:4|bladder:4|lung:4|ovaryfallopiantube:4|softtissue:3|unk:3|uterus:3|cnsbrain:2|esophagusstomach:2|headandneck:2|bone:1|pancreas:1|thymus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 K:142 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:62|bowel:18|thyroid:17|blood:12|softtissue:6|lung:5|unk:5|bladder:3|cnsbrain:2|thymus:2|adrenalgland:1|biliarytract:1|esophagusstomach:1|headandneck:1|kidney:1|liver:1|ovaryfallopiantube:1|pancreas:1|testis:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 L:46 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:24|bowel:7|lung:6|blood:2|cnsbrain:2|unk:2|bladder:1|softtissue:1|uterus:1
NRAS 61 -1237.69143477067 422 Q:422 620 0.333333333333333 295|0.692307692307692:0.733333333333333:0.2:0.933333333333333:1:0.25:0.666666666666667:1:0.25:0.571428571428571:1:1:0.5:0.363636363636364:0.428571428571429:0.0833333333333333:1:1:1:1:0.5:1:0.125:0.363636363636364:0.173913043478261:0.25:1:0.8:0.153846153846154:0.857142857142857:0.5:0.5:0.5:1:0.272727272727273:0.214285714285714:1:0.5:1:1:0.2:0.333333333333333:0.6875:0.708333333333333:0.25:0.266666666666667:0.111111111111111:1:1:0.333333333333333:0.428571428571429:0.666666666666667:0.25:0.5:0.833333333333333:0.5:0.735294117647059:0.0476190476190476:0.1:0.133333333333333:0.230769230769231:0.25:1:0.5:0.294117647058824:0.217391304347826:0.46875:0.5:1:0.2:0.166666666666667:0.666666666666667:1:0.8:0.407407407407407:1:0.0212765957446809:0.285714285714286:0.0909090909090909:0.333333333333333:0.2:0.333333333333333:0.5:0.5:1:0.111111111111111:0.5:0.903846153846154:0.5:0.2:1:1:0.0909090909090909:0.4:0.428571428571429:0.0625:0.25:0.833333333333333:1:0.956521739130435:0.111111111111111:0.6:0.212765957446809:0.5:0.207547169811321:1:0.75:0.294117647058824:0.666666666666667:1:0.333333333333333:0.714285714285714:0.142857142857143:1:0.3:0.416666666666667:0.272727272727273:0.25:0.333333333333333:0.345454545454545:0.0952380952380952:0.166666666666667:0.111111111111111:0.454545454545455:0.0666666666666667:1:0.636363636363636:0.636363636363636:0.25:0.272727272727273:0.824324324324324:1:0.75:0.545454545454545:1:1:0.0769230769230769:0.363636363636364:0.290322580645161:0.333333333333333:0.179487179487179:1:0.0666666666666667:0.333333333333333:1:0.478260869565217:0.166666666666667:1:1:0.0276497695852535:0.0716845878136201:0.0263736263736264:0.933333333333333:1:0.5:1:1:0.8125:0.361788617886179:0.113761467889908:0.113761467889908:0.157894736842105:0.333333333333333:0.0555555555555556:0.0357142857142857:0.375:0.111111111111111:0.584415584415584:0.0350877192982456:0.751111111111111:0.761245674740484:0.164989939637827:0.196652719665272:0.135549872122762:0.172113289760349:0.0240963855421687:0.0620767494356659:0.142268041237113:0.147441457068517:0.147959183673469:0.038961038961039:0.686274509803922:0.0929054054054054:0.364787111622555:0.331306990881459:0.691449814126394:0.691449814126394:0.0769230769230769:0.347826086956522:0.117647058823529:0.148148148148148:0.05:0.290030211480363:0.680272108843537:0.188679245283019:0.0701754385964912:0.801526717557252:0.236842105263158:0.1953125:0.0539906103286385:0.015625:0.0390492359932088:0.00790513833992095:0.0597826086956522:0.136783733826248:0.362359550561798:0.0713719270420301:0.328621908127208:0.0657672849915683:0.320099255583127:0.075:0.433021806853583:0.524818401937046:0.524818401937046:0.259259259259259:0.483695652173913:0.0269360269360269:0.100486223662885:0.785507246376812:0.137870855148342:0.472340425531915:0.194331983805668:0.0830769230769231:0.418055555555556:0.546296296296296:0.247596153846154:0.52:0.39832285115304:0.601866251944012:0.234016887816647:0.214007782101167:0.153153153153153:0.137180700094607:0.0666666666666667:0.037037037037037:0.1:0.2:0.458333333333333:0.0588235294117647:0.111111111111111:0.333333333333333:0.181818181818182:0.473684210526316:0.5:0.2:0.136363636363636:0.0769230769230769:0.142857142857143:0.285714285714286:0.25:0.445714285714286:0.149377593360996:0.0227790432801822:0.182278481012658:0.540123456790123:0.021505376344086:0.541666666666667:0.00429184549356223:0.473684210526316:0.103508771929825:0.0930232558139535:0.391304347826087:0.072:0.0113636363636364:0.148837209302326:0.448051948051948:0.761038961038961:0.530373831775701:0.222857142857143:0.433862433862434:0.0810810810810811:0.0723327305605787:0.410714285714286:0.247910863509749:0.384615384615385:0.125:0.24:0.783582089552239:0.0646651270207852:0.445569620253165:0.754777070063694:0.165137614678899:0.10732538330494:0.0375:0.538461538461538:0.0981387478849408:0.029126213592233:0.0833333333333333:0.443514644351464:0.0917431192660551:0.03125:0.674418604651163:0.3125:0.375:0.314285714285714 H:27 cAa/cGa:203|Caa/Aaa:140|cAa/cTa:46|caA/caT:14|caA/caC:13|ggACaa/ggCAaa:2|cAa/cCa:2|Caa/Taa:1|CAa/AGa:1 1:115256529_252|1:115256530_143|1:115256528_27 skcm:787:186|thpa:486:43|mm:275:27|thpd:58:18|coadread:683:16|luad:2057:15|coad:712:13|mup:42:7|aml:198:6|blca:852:5|thap:33:5|read:149:5|rms:50:5|uec:339:5|nsgct:152:5|cll:283:4|ihch:104:4|lgsoc:17:3|sem:59:3|thhc:21:3|erms:8:3|lggnos:544:3|utuc:76:2|cup:135:2|thfo:5:2|sarcl:13:2|mfh:53:2|gbm:688:2|soc:468:2|stad:748:2|thym:125:2|es:229:1|npc:66:1|unk:146:1|panet:86:1|hnsc:643:1|armm:21:1|tmt:3:1|acrm:23:1|thyc:9:1|odg:36:1|paasc:8:1|hnmucm:11:1|blad:7:1|esca:556:1|mixed:3:1|chol:152:1|hcc:620:1|sarc:280:1|chrcc:88:1|aca:93:1 skin:974:187|thyroid:618:71|blood:890:37|bowel:1782:35|lung:2761:17|unk:357:11|softtissue:739:11|testis:217:9|bladder:958:8|cnsbrain:2270:6|ovaryfallopiantube:699:5|biliarytract:358:5|uterus:618:5|headandneck:988:3|thymus:162:3|esophagusstomach:1407:3|pancreas:1059:2|bone:297:1|liver:636:1|kidney:1304:1|adrenalgland:291:1 TTG|ACA|CTT|TCG|CCC|CCA 0.0120300464273379 0.0267810594223141 24592 "pancan,skin,thyroid,bowel,blood,lung,softtissue,testis,bladder,cnsbrain,biliarytract,ovaryfallopiantube,uterus,thymus,headandneck,esophagusstomach" 0 NRAS 61 0 FALSE NA 1 1.16795714944678 1.26187131041539 1.29838371117394 TRUE 165 257 RETAIN TRUE TRUE Q 0 skin skin:12|blood:7|bowel:2|lung:2|testis:2|softtissue:1|unk:1

Indel

Hugo_Symbol     Amino_Acid_Position     log10_pvalue    Mutation_Count  Reference_Amino_Acid    Total_Mutations_in_Gene Median_Allele_Freq_Rank Allele_Freq_Rank        SNP_ID  Variant_Amino_Acid    Codon_Change    Genomic_Position        Detailed_Cancer_Types   Organ_Types     Tri-nucleotides Mutability      mu_protein      ccf     Total_Samples   indel_size      qvalue  tm   Is_repeat        seq     length  align100        pad12entropy    pad24entropy    pad36entropy    TP      reason  n_MSK   n_Retro judgement       inNBT   inOncokb        Samples
SMARCA4 546 -7.75235638169585 5 QK:5 101 NA NA :NA K546del:5 cAGAag/cag:5 19:11106926_5 lgg:536:4|dlbcl:246:1 cnsbrain:2283:4|lymph:366:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 1 0.000230672905611517 SMARCA4 546 FALSE NA NA 1 0.91489630957268 1.2950060272429 1.33965330506364 FALSE LOCAL_ENTROPY 1 4 RETAIN FALSE FALSE cnsbrain:4|lymph:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA V28_E33del:4 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE cervix:1|esophagusstomach:1|lung:1|pancreas:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA L32_L37del:3 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 1 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE skin:2|esophagusstomach:1
CDKN2A 27-42 -6.82111516846557 12 VRALLEA:4|LEAGALP:3|ALPN:1|EV:1|GA:1|PNAPN:1|RALLEA:1 219 NA NA :NA A36_N39delinsD:1 gTGCGGGCGCTGCTGGAGGcg/gcg:4|cTGGAGGCGGGGGCGCTGCcc/ccc:3|GGGGCG/-:1|gCGCTGCCCAac/gac:1|gAGGtg/gtg:1|CGGGCGCTGCTGGAGGCG/-:1|ccCAACGCACCGAAt/cct:1 9:21974727_4|9:21974715_3|9:21974745_1|9:21974725_1|9:21974719_1|9:21974712_1|9:21974702_1 luad:2071:3|esca:556:2|blca:852:1|skcm:192:1|icemu:1:1|paad:932:1|mel:595:1|stad:748:1|hnsc:650:1 esophagusstomach:1413:3|lung:2767:3|skin:974:2|bladder:955:1|cervix:234:1|pancreas:1059:1|headandneck:988:1 NA 0.0573226243518208 0.0473351872460284 NA 24592 15 8.77193090544841e-05 CDKN2A 27-42 FALSE NA NA 0.857780912379927 1.13008762297022 1.1577633500238 FALSE LOCAL_ENTROPY 6 6 RETAIN FALSE FALSE lung:1

Parsing

From the file, we're mainly interested in the following columns:

  • Hugo_Symbol
  • Amino_Acid_Position
  • Mutation_Count
  • Reference_Amino_Acid
  • Variant_Amino_Acid
  • qvalue

We map the gene symbol onto the canonical transcripts (RefSeq & Ensembl) for that gene. For SNVs, we obtain position, ref and alt amino acid from source file and generate substitution notation. For indels, we get protein change notation from Reference_Amino_Acid column. Then we match each entry using these notations.

caution

We currently skip all variants labeled as splice from the source

JSON Output

The data source will be captured under the cancerHotspots key in the transcript section.

{
"transcript":"NM_002524.5",
"source":"RefSeq",
"bioType":"mRNA",
"aminoAcids":"Q/K",
"proteinPos":"61",
"geneId":"4893",
"hgnc":"NRAS",
"hgvsc":"NM_002524.5:c.181C>A",
"hgvsp":"NP_002515.1:p.(Gln61Lys)",
"isCanonical":true,
"proteinId":"NP_002515.1",
"cancerHotspots":{
"residue":"Q61",
"numSamples":422,
"numAltAminoAcidSamples":142,
"qValue":0
}
}
FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble
- - + + \ No newline at end of file diff --git a/data-sources/clingen-dosage-json/index.html b/data-sources/clingen-dosage-json/index.html index bc6f2dbd..b3dae215 100644 --- a/data-sources/clingen-dosage-json/index.html +++ b/data-sources/clingen-dosage-json/index.html @@ -6,13 +6,13 @@ clingen-dosage-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
- - +
Version: 3.25 (unreleased)

clingen-dosage-json

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely
+ + \ No newline at end of file diff --git a/data-sources/clingen-gene-validity-json/index.html b/data-sources/clingen-gene-validity-json/index.html index b79b1150..50a41544 100644 --- a/data-sources/clingen-gene-validity-json/index.html +++ b/data-sources/clingen-gene-validity-json/index.html @@ -6,13 +6,13 @@ clingen-gene-validity-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
- - +
Version: 3.25 (unreleased)

clingen-gene-validity-json

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship
+ + \ No newline at end of file diff --git a/data-sources/clingen-json/index.html b/data-sources/clingen-json/index.html index 1b1d1fd8..bd56130d 100644 --- a/data-sources/clingen-json/index.html +++ b/data-sources/clingen-json/index.html @@ -6,13 +6,13 @@ clingen-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
- - +
Version: 3.25 (unreleased)

clingen-json

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
+ + \ No newline at end of file diff --git a/data-sources/clingen/index.html b/data-sources/clingen/index.html index 7b7d19d8..bb54ebc5 100644 --- a/data-sources/clingen/index.html +++ b/data-sources/clingen/index.html @@ -6,13 +6,13 @@ ClinGen | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

- - +
Version: 3.25 (unreleased)

ClinGen

Overview

ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Publication

Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.

ISCA Regions

TSV Extraction

ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].

#bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        attrCount       attrTags        attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482

Status levels

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Parsing

We parse the ClinGen tsv file and extract the following:

  • chrom
  • chromStart (note this a 0-based coordinate)
  • chromEnd
  • attrTags
  • attrVals

attrTags and attrVals are comma separated lists. attrTags contains the field keys and attrVals contains the field values. We will parse the following keys from the two fields:

  • parent (this will be used as the ID in our JSON output)
  • clinical_int
  • validated
  • phenotype (this should be a string array)
  • phenotype_id (this should be a string array)

Observed losses and observed gains will be calculated from entries that share a common parent ID.

  • variants with a common parent ID and same coordinates are grouped
    • calculated observed losses, observed gains for each group
    • Clinical significance and validation status are collapsed using the priority strategy described below
  • Variants with the same parent ID can have different coordinates (mapped to hg38)
    • nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
    • we kept both variants

Conflict Resolution

Clinical significance priority

When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.

Priority (high to low)

  • Priority
  • Pathogenic
  • Likely pathogenic
  • Benign
  • Likely benign
  • Uncertain significance

Validation Priority

When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.

Download URL

https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite

JSON Output

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain

Dosage Sensitivity Map

The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.

Publication

Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.

TSV Source files

Regions

#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801

Genes

#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600

Dosage Rating System

RatingPossible Clinical Interpretation
0No evidence to suggest that dosage sensitivity is associated with clinical phenotype
1Little evidence suggesting dosage sensitivity is associated with clinical phenotype
2Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
3Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
30Gene associated with autosomal recessive phenotype
40Dosage sensitivity unlikely

Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml

Download URL

ftp://ftp.clinicalgenome.org/

JSON Output

"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

Building the supplementary files

The gene dosage sensitivity .nga for Illumina Connected Annotations can be built using the SAUtils command's DosageSensitivity subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:00.1

For building the .nsi files, we use the SAUtils command's DosageMapRegions subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv (url provided above) and its associated .version file.

NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)

Here is a sample run:

dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data

OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Writing 505 intervals to database...

Time: 00:00:00.1

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

Gene-Disease Validity

The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.

Publication

Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015

Source TSV

The source data comes in a CSV file that we convert to a TSV.

CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z

Download URL

https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity

Conflict Resolution

Multiple Classifications

Here is an example of multiple classifications.

$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z

In such cases, we select the more severe classification.

Multiple Dates

$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv  | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00

If the classifications are the same, we should select the latest classification date.

JSON Output

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

Building the supplementary files

The gene disease validity .nga for Illumina Connected Annotations can be built using the SAUtils command's DiseaseValidity subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv (url provided above) and its associated .version file.

NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)

Here is a sample run:

 dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data

OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

Number of geneIds missing from the cache:0 (0%)

Time: 00:00:00.2

You can also use SAUtils command's subcommands AutoDownloadGenerate to generate ClinGen files. To use AutoDownloadGenerate, read more in SAUtils section.

+ + \ No newline at end of file diff --git a/data-sources/clinvar-json/index.html b/data-sources/clinvar-json/index.html index dc6010c6..a9a28c3c 100644 --- a/data-sources/clinvar-json/index.html +++ b/data-sources/clinvar-json/index.html @@ -6,13 +6,13 @@ clinvar-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
- - +
Version: 3.25 (unreleased)

clinvar-json

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity
+ + \ No newline at end of file diff --git a/data-sources/clinvar-preview-json/index.html b/data-sources/clinvar-preview-json/index.html index 18e3278b..23c85404 100644 --- a/data-sources/clinvar-preview-json/index.html +++ b/data-sources/clinvar-preview-json/index.html @@ -6,13 +6,13 @@ clinvar-preview-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

clinvar-preview-json

small variants:

{
"clinvar-preview": [
{
"altAllele": "A",
"refAllele": "G",
"variantType": "SNV",
"accession": "VCV000437934",
"version": "1",
"recordType": "classified",
"dateLastUpdated": "2023-08-06",
"rcvs": [
{
"accession": "RCV000505090",
"version": "1",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2016-08-31",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "Cleidocranial dysostosis",
"db": "MedGen",
"id": "C0008928"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2016-08-31",
"mostRecentSubmission": "2017-09-09",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "820",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Cleidocranial+Dysplasia/1683"
},
{
"db": "SNOMED CT",
"id": "65976001"
}
],
"value": "Cleidocranial dysostosis"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000598565"
}
]
}
]
}

large variants:

{
"clinvar-preview": [
{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
"variantType": "copy_number_gain",
"accession": "VCV000154089",
"version": "2",
"recordType": "classified",
"dateLastUpdated": "2023-10-15",
"rcvs": [
{
"accession": "RCV000142236",
"version": "6",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2014-03-10",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "See cases"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2014-03-10",
"mostRecentSubmission": "2015-07-13",
"conditions": [
{
"type": "PhenotypeInstruction",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "18728",
"name": {
"value": "See cases"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000183512"
}
]
}
]
}
FieldTypeNotes
chromosomestringChromosome
beginintegerstart position of variant
endintegerend of position of variant
refAllelestring
altAllelestring
accessionstringClinVar ID
versionstringClinVar version
variantTypestringvariant type
recordTypestringrecord type
dateLastUpdatedstringyyyy-MM-dd
rcvsarrayRCV objects associated to this VCV
classificationsarrayclassifications for this VCV
clinicalAssertionsarraySCV objects associated to this VCV
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

Variant Types

  • copy_number_gain
  • copy_number_loss
  • deletion
  • delins
  • duplication
  • insertion
  • inversion
  • SNV
  • tandem_duplication

Review Statuses

  • criteria provided, conflicting classifications
  • criteria provided, multiple submitters, no conflicts
  • criteria provided, single submitter
  • no assertion criteria provided
  • no classification provided
  • practice guideline
  • reviewed by expert panel

classification

  • Benign
  • Likely benign
  • Pathogenic
  • Uncertain significance
  • Likely pathogenic
  • Benign/Likely benign
  • not provided
  • conflicting data from submitters
  • Pathogenic/Likely pathogenic
  • association
  • Conflicting classifications of pathogenicity
  • Pathogenic; risk factor
  • risk factor
  • other
  • drug response
  • Uncertain significance; Pathogenic/Likely pathogenic
  • Likely pathogenic, low penetrance
  • Pathogenic; Affects
  • Pathogenic, low penetrance
  • protective
  • Affects
  • Benign; other
  • Conflicting classifications of pathogenicity; other
  • Conflicting classifications of pathogenicity; association
  • Uncertain risk allele
  • Uncertain significance; risk factor
  • Likely pathogenic; risk factor
  • Likely benign; association
  • Likely risk allele
  • Pathogenic/Likely pathogenic; other
  • Pathogenic; other
  • Pathogenic/Likely pathogenic/Pathogenic, low penetrance
  • Pathogenic/Likely pathogenic; risk factor
  • Benign/Likely benign; risk factor
  • Uncertain significance/Uncertain risk allele
  • Pathogenic; association; protective
  • protective; risk factor
  • Benign/Likely benign; other; risk factor
  • Benign/Likely benign; association
  • Benign; association
  • Affects; association; other
  • Pathogenic; protective
  • Conflicting classifications of pathogenicity; drug response; other
  • Conflicting classifications of pathogenicity; drug response
  • Benign; drug response
  • Likely pathogenic; other
  • Conflicting classifications of pathogenicity; protective
  • Pathogenic/Likely pathogenic; drug response
  • Benign/Likely benign; other
  • Likely pathogenic/Likely risk allele
  • Uncertain risk allele; protective
  • association not found
  • Affects; association
  • Uncertain significance; association
  • Likely benign; other
  • Uncertain significance; other
  • Conflicting classifications of pathogenicity; association; risk factor Pathogenic;
  • association
  • Benign; risk factor
  • Conflicting classifications of pathogenicity; other; risk factor
  • Pathogenic/Likely risk allele; risk factor
  • Uncertain significance; drug response
  • Conflicting classifications of pathogenicity; risk factor
  • other; risk factor
  • Pathogenic/Likely pathogenic/Likely risk allele
  • Likely pathogenic; drug response
  • Conflicting classifications of pathogenicity; Affects
  • association; drug response; risk factor
  • Pathogenic; drug response
  • Affects; risk factor
  • Pathogenic; drug response; other
  • Likely pathogenic; protective
  • confers sensitivity
  • Likely pathogenic; association
  • Benign; Affects
  • Likely pathogenic; Affects
  • Uncertain risk allele; risk factor
  • drug response; risk factor
  • Pathogenic/Likely risk allele
  • Likely benign; drug response; other
  • Benign/Likely benign; drug response
  • Benign/Likely benign; drug response; other
  • drug response; other
  • association; drug response
  • Pathogenic; confers sensitivity
  • association; risk factor
  • Pathogenic/Pathogenic, low penetrance; other
  • Benign; confers sensitivity
  • confers sensitivity; other
  • Likely pathogenic/Pathogenic, low penetrance
  • Likely benign; risk factor
- - +
Version: 3.25 (unreleased)

clinvar-preview-json

small variants:

{
"clinvar-preview": [
{
"altAllele": "A",
"refAllele": "G",
"variantType": "SNV",
"accession": "VCV000437934",
"version": "1",
"recordType": "classified",
"dateLastUpdated": "2023-08-06",
"rcvs": [
{
"accession": "RCV000505090",
"version": "1",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2016-08-31",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "Cleidocranial dysostosis",
"db": "MedGen",
"id": "C0008928"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2016-08-31",
"mostRecentSubmission": "2017-09-09",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "820",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Cleidocranial+Dysplasia/1683"
},
{
"db": "SNOMED CT",
"id": "65976001"
}
],
"value": "Cleidocranial dysostosis"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000598565"
}
]
}
]
}

large variants:

{
"clinvar-preview": [
{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
"variantType": "copy_number_gain",
"accession": "VCV000154089",
"version": "2",
"recordType": "classified",
"dateLastUpdated": "2023-10-15",
"rcvs": [
{
"accession": "RCV000142236",
"version": "6",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2014-03-10",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "See cases"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2014-03-10",
"mostRecentSubmission": "2015-07-13",
"conditions": [
{
"type": "PhenotypeInstruction",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "18728",
"name": {
"value": "See cases"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000183512"
}
]
}
]
}
FieldTypeNotes
chromosomestringChromosome
beginintegerstart position of variant
endintegerend of position of variant
refAllelestring
altAllelestring
accessionstringClinVar ID
versionstringClinVar version
variantTypestringvariant type
recordTypestringrecord type
dateLastUpdatedstringyyyy-MM-dd
rcvsarrayRCV objects associated to this VCV
classificationsarrayclassifications for this VCV
clinicalAssertionsarraySCV objects associated to this VCV
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

Variant Types

  • copy_number_gain
  • copy_number_loss
  • deletion
  • delins
  • duplication
  • insertion
  • inversion
  • SNV
  • tandem_duplication

Review Statuses

  • criteria provided, conflicting classifications
  • criteria provided, multiple submitters, no conflicts
  • criteria provided, single submitter
  • no assertion criteria provided
  • no classification provided
  • practice guideline
  • reviewed by expert panel

classification

  • Benign
  • Likely benign
  • Pathogenic
  • Uncertain significance
  • Likely pathogenic
  • Benign/Likely benign
  • not provided
  • conflicting data from submitters
  • Pathogenic/Likely pathogenic
  • association
  • Conflicting classifications of pathogenicity
  • Pathogenic; risk factor
  • risk factor
  • other
  • drug response
  • Uncertain significance; Pathogenic/Likely pathogenic
  • Likely pathogenic, low penetrance
  • Pathogenic; Affects
  • Pathogenic, low penetrance
  • protective
  • Affects
  • Benign; other
  • Conflicting classifications of pathogenicity; other
  • Conflicting classifications of pathogenicity; association
  • Uncertain risk allele
  • Uncertain significance; risk factor
  • Likely pathogenic; risk factor
  • Likely benign; association
  • Likely risk allele
  • Pathogenic/Likely pathogenic; other
  • Pathogenic; other
  • Pathogenic/Likely pathogenic/Pathogenic, low penetrance
  • Pathogenic/Likely pathogenic; risk factor
  • Benign/Likely benign; risk factor
  • Uncertain significance/Uncertain risk allele
  • Pathogenic; association; protective
  • protective; risk factor
  • Benign/Likely benign; other; risk factor
  • Benign/Likely benign; association
  • Benign; association
  • Affects; association; other
  • Pathogenic; protective
  • Conflicting classifications of pathogenicity; drug response; other
  • Conflicting classifications of pathogenicity; drug response
  • Benign; drug response
  • Likely pathogenic; other
  • Conflicting classifications of pathogenicity; protective
  • Pathogenic/Likely pathogenic; drug response
  • Benign/Likely benign; other
  • Likely pathogenic/Likely risk allele
  • Uncertain risk allele; protective
  • association not found
  • Affects; association
  • Uncertain significance; association
  • Likely benign; other
  • Uncertain significance; other
  • Conflicting classifications of pathogenicity; association; risk factor Pathogenic;
  • association
  • Benign; risk factor
  • Conflicting classifications of pathogenicity; other; risk factor
  • Pathogenic/Likely risk allele; risk factor
  • Uncertain significance; drug response
  • Conflicting classifications of pathogenicity; risk factor
  • other; risk factor
  • Pathogenic/Likely pathogenic/Likely risk allele
  • Likely pathogenic; drug response
  • Conflicting classifications of pathogenicity; Affects
  • association; drug response; risk factor
  • Pathogenic; drug response
  • Affects; risk factor
  • Pathogenic; drug response; other
  • Likely pathogenic; protective
  • confers sensitivity
  • Likely pathogenic; association
  • Benign; Affects
  • Likely pathogenic; Affects
  • Uncertain risk allele; risk factor
  • drug response; risk factor
  • Pathogenic/Likely risk allele
  • Likely benign; drug response; other
  • Benign/Likely benign; drug response
  • Benign/Likely benign; drug response; other
  • drug response; other
  • association; drug response
  • Pathogenic; confers sensitivity
  • association; risk factor
  • Pathogenic/Pathogenic, low penetrance; other
  • Benign; confers sensitivity
  • confers sensitivity; other
  • Likely pathogenic/Pathogenic, low penetrance
  • Likely benign; risk factor
+ + \ No newline at end of file diff --git a/data-sources/clinvar-preview/index.html b/data-sources/clinvar-preview/index.html index 98203c6e..0d130855 100644 --- a/data-sources/clinvar-preview/index.html +++ b/data-sources/clinvar-preview/index.html @@ -6,18 +6,18 @@ ClinVar Preview | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

ClinVar Preview

Overview

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

ClinVar Preview relates to the new ClinVar XML format introduced in 2024. +

Version: 3.25 (unreleased)

ClinVar Preview

Overview

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

ClinVar Preview relates to the new ClinVar XML format introduced in 2024. Following sections describe the parsing and subsequent json format provided by Illumina Connected Annotations.

Parsing

ClinVar recommends using the VCV XML file because it contains comprehensive information.

Parsing is simplified by using the XSD file generation. Command for generating XSD file

xsd ClinVar_VCV.xsd /n:VariationArchive /c

Overall XML to JSON mapping

keytypedescriptionXML path
variantTypestringsequence ontologyVariationArchive.VariationType
accessionstringVCV Id from ClinVarVariationArchive.Accession
versionstringVCV Id versionVariationArchive.Version
recordTypestringclassifiedVariationArchive.RecordType
dateLastUpdateddate timedate VCV was last updatedVariationArchive.DateLastUpdated
chromosomestringchromosome (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.Chr
beginnumberstart position of the variant (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.positionVCF
endnumberend position of the variant (large variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.displayStop or calculated
refAllelestringreference alleles (small variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.referenceAlleleVCF
altAllelestringalternate alleles (small variants only)VariationArchive.ClassifiedRecord.SimpleAllele.Location.SequenceLocation.alternateAlleleVCF
rcvslistlist of RCV objectsVariationArchive.ClassifiedRecord.RCVList
classificationslistlist of classification objectsVariationArchive.ClassifiedRecord.Classifications
clinicalAssertionslistlist of clinicalAssertion objectsVariationArchive.ClassifiedRecord.ClinicalAssertionList

Variation fields

XML

<VariationArchive
VariationID="1381081"
VariationName="NM_003000.3(SDHB):c.19_41dup (p.Pro14_Ala15insSerProTer)"
VariationType="Indel"
Accession="VCV001381081"
Version="3"
RecordType="classified"
DateLastUpdated="2024-01-26"
NumberOfSubmissions="1"
NumberOfSubmitters="1"
DateCreated="2022-03-28"
MostRecentSubmission="2023-02-07"
>
...

JSON

{
"variantType": "delins",
"accession": "VCV001381081",
"version": "3",
"recordType": "classified",
"dateLastUpdated": "2024-01-26",
...
}

Location fields

<SimpleAllele
AlleleID="196495"
VariationID="1381081"
>
<Location>
<CytogeneticLocation>1p36.13</CytogeneticLocation>
<SequenceLocation
Accession="NC_000001.11"
Chr="1"
Assembly="GRCh38"
positionVCF="17053978"
referenceAlleleVCF="C"
alternateAlleleVCF="CGGCAACCGGCGCCTCAAGGAGAG"
display_start="17053978"
display_stop="17053979"
AssemblyAccessionVersion="GCF_000001405.38"
forDisplay="true"
AssemblyStatus="current"
start="17053978"
stop="17053979"
variantLength="23"
/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="17380473"
stop="17380474" display_start="17380473" display_stop="17380474"
variantLength="23" positionVCF="17380473" referenceAlleleVCF="C"
alternateAlleleVCF="CGGCAACCGGCGCCTCAAGGAGAG"/>
</Location>
...
</SimpleAllele>

JSON Small Variant

note the alleles are trimmed

{
"altAllele": "GGCAACCGGCGCCTCAAGGAGAG",
"refAllele": "-",
...
}

JSON Large Variant

{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
...
}

RCVs

RCV Object from XML path VariationArchive.ClassifiedRecord.RCVList

keytypedescriptionXML sub-path
accessionstringVCV Id from ClinVarRCVList.RCVAccession.Accession
versionstringVCV Id versionRCVList.RCVAccession.Accession
classificationslistlist of classification objectsRCVList.RCVAccession.RCVClassifications
classifiedConditionslistlist of classified conditionsRCVList.RCVAccession.ClassifiedConditionList

XML

<RCVList>
<RCVAccession
Title="NM_003000.3(SDHB):c.19_41dup (p.Pro14_Ala15insSerProTer) AND multiple conditions"
Accession="RCV001921860"
Version="3">
<ClassifiedConditionList TraitSetID="23696">
...
</ClassifiedConditionList>
<RCVClassifications>
...
</RCVClassifications>
</RCVAccession>
</RCVList>
...

JSON

{
"rcvs": [
{
"accession": "RCV001921860",
"version": "3",
"classifications": {
...
},
"classifiedConditions": [
...
]
}
]
}

Classifications

Classification object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications classification can be of following types:

  1. germlineClassification
  2. somaticClinicalImpact
  3. oncogenicityClassification
Germline Classification

Classification object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.RCVClassifications.GermlineClassification

keytypedescriptionXML sub-path
reviewStatusstringreview statusGermlineClassification.ReviewStatus
descriptionslistlist of classification objectsGermlineClassification.Description
descriptions[].classificationstringclassificationGermlineClassification.Description.Value
descriptions[].dateLastEvaluateddatedate last evaluatedGermlineClassification.Description.DateLastEvaluated

XML

<RCVClassifications>
<GermlineClassification>
<ReviewStatus>criteria provided, single submitter</ReviewStatus>
<Description DateLastEvaluated="2021-08-04" SubmissionCount="1">Pathogenic</Description>
</GermlineClassification>
</RCVClassifications>

JSON

{
"classifications": {
"germlineClassification": {
"reviewStatus": "criteria provided, single submitter",
"descriptions": [
{
"dateLastEvaluated": "2021-08-04",
"classification": "Pathogenic"
}
]
}
}
}
Classified Conditions

Classified conditions object from XML path VariationArchive.ClassifiedRecord.RCVList.RCVAccession.ClassifiedConditionList

keytypedescriptionXML sub-path
conditionstringVCV Id from ClinVarClassifiedConditionList.ClassifiedCondition.Value
dbstringlist of classification objectsClassifiedConditionList.ClassifiedCondition.DB
idstringclassificationClassifiedConditionList.ClassifiedCondition.ID

XML

<ClassifiedConditionList TraitSetID="23696">
<ClassifiedCondition DB="MedGen" ID="C0238198">Gastrointestinal stromal tumor</ClassifiedCondition>
<ClassifiedCondition DB="MedGen" ID="C1861848">Paragangliomas 4</ClassifiedCondition>
<ClassifiedCondition DB="MedGen" ID="C0031511">Pheochromocytoma</ClassifiedCondition>
</ClassifiedConditionList>

JSON

{
"classifiedConditions": [
{
"condition": "Gastrointestinal stromal tumor",
"db": "MedGen",
"id": "C0238198"
},
{
"condition": "Paragangliomas 4",
"db": "MedGen",
"id": "C1861848"
},
{
"condition": "Pheochromocytoma",
"db": "MedGen",
"id": "C0031511"
}
]
}

Classifications

Classification object from XML path VariationArchive.ClassifiedRecord.Classifications classification can be of following types:

  1. germlineClassification
  2. somaticClinicalImpact
  3. oncogenicityClassification

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
...
</GermlineClassification>
</Classifications>

JSON

"classifications": {
"germlineClassification": {...}
}
Germline Classification

Classification object from XML path VariationArchive.ClassifiedRecord.Classifications.GermlineClassification

keytypedescriptionXML sub-path
classificationstringclassificationGermlineClassification.Description
reviewStatusstringreview statusGermlineClassification.ReviewStatus
dateLastEvaluateddatedate last evaluatedGermlineClassification.DateLastEvaluated
mostRecentSubmissiondatedate last evaluatedGermlineClassification.MostRecentSubmission
pubMedIdslistlist of PubMedIdsGermlineClassification.Citation.ID.Value
conditionslistlist of conditionsGermlineClassification.ConditionList

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
<ReviewStatus>criteria provided, single submitter</ReviewStatus>
<Description>Pathogenic</Description>
<Citation Type="general">
<ID Source="PubMed">19454582</ID>
</Citation>
<Citation Type="general">
<ID Source="PubMed">19802898</ID>
</Citation>
<ConditionList>
...
</ConditionList>
</GermlineClassification>
</Classifications>

JSON

{
"classifications": {
"germlineClassification": {
"classification": "Pathogenic",
"reviewStatus": "criteria provided, single submitter",
"dateLastEvaluated": "2021-08-04",
"mostRecentSubmission": "2023-02-07",
"conditions": [...],
"pubMedIds": [
"19454582",
"19802898"
]
}
}
}
Conditions

Conditions object from XML path VariationArchive.ClassifiedRecord.Classifications.GermlineClassification.ConditionList

keytypedescriptionXML sub-path
typestringclassificationConditionList.TraitSet.Type
contributesToAggregateClassificationTrue or blankcontributes to aggregate classifcationConditionList.TraitSet.ContributesToAggregateClassification
traitslisttrait objectsConditionList.TraitSet.Trait
traits[].iddatedate last evaluatedConditionList.TraitSet.Trait
traits[].nameobjecttrait name objectConditionList.TraitSet.Trait
traits[].name.valuestringpreferred trait nameConditionList.TraitSet.Trait.Name.ElementValue.Type
traits[].name.xRefslistlist of cross referencesConditionList.TraitSet.Trait.Name.XRef
traits[].name.xRefs[].dbstringpreferred name cross reference databaseConditionList.TraitSet.Trait.Name.XRef.DB
traits[].name.xRefs[].idstringpreferred name cross reference identifierConditionList.TraitSet.Trait.Name.XRef.ID

XML

<Classifications>
<GermlineClassification DateLastEvaluated="2021-08-04" NumberOfSubmissions="1" NumberOfSubmitters="1"
DateCreated="2022-03-28" MostRecentSubmission="2023-02-07">
<ConditionList>
<TraitSet ID="23696" Type="Disease" ContributesToAggregateClassification="true">
<Trait ID="3796" Type="Disease">
<Name>
<ElementValue Type="Preferred">Pheochromocytoma</ElementValue>
<XRef ID="Pheochromocytoma/5718" DB="Genetic Alliance"/>
<XRef ID="HP:0002666" DB="Human Phenotype Ontology"/>
<XRef ID="MONDO:0008233" DB="MONDO"/>
</Name>
<Name>
<ElementValue Type="Alternate">Chromaffinoma</ElementValue>
</Name>
...
</Trait>
</TraitSet>
</ConditionList>
</GermlineClassification>
</Classifications>

JSON

{
"classifications": {
"germlineClassification": {
"classification": "Pathogenic",
"reviewStatus": "criteria provided, single submitter",
"dateLastEvaluated": "2021-08-04",
"mostRecentSubmission": "2023-02-07",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "3796",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Pheochromocytoma/5718"
},
{
"db": "Human Phenotype Ontology",
"id": "HP:0002666"
},
{
"db": "MONDO",
"id": "MONDO:0008233"
}
],
"value": "Pheochromocytoma"
}
}
]
}
],
"pubMedIds": [
"19454582",
"19802898"
]
}
}
}

Clinical Assertions

Conditions object from XML path VariationArchive.ClassifiedRecord.ClinicalAssertionList

keytypedescriptionXML sub-path
accessionstringSCV Id from ClinVarClinicalAssertionList.ClinVarAccession.Accession
pubMedIdslistlist of PubMedIdsClinicalAssertionList.ClinicalAssertion.AttributeSet.Citation.ID.Value

XML

<ClinicalAssertionList>
<ClinicalAssertion ID="4172562" SubmissionDate="2023-01-11" DateLastUpdated="2023-02-07"
DateCreated="2022-03-28">
<ClinVarSubmissionID localKey="12475853|MedGen:C0238198;C1861848;C0031511"
submittedAssembly="GRCh37"/>
<ClinVarAccession
Accession="SCV002152762"
DateUpdated="2023-02-07"
DateCreated="2022-03-28"
Type="SCV"
Version="2"
SubmitterName="Invitae"
OrgID="500031"
OrganizationCategory="laboratory"
OrgAbbreviation="Invitae"
/>
<RecordStatus>current</RecordStatus>
<Classification DateLastEvaluated="2021-08-04">
...
</Classification>
<Assertion>variation to disease</Assertion>
<AttributeSet>
<Attribute Type="AssertionMethod">Invitae Variant Classification Sherloc (09022015)</Attribute>
<Citation>
<ID Source="PubMed">28492532</ID>
</Citation>
</AttributeSet>
<ObservedInList>
...
</ObservedInList>
<SimpleAllele>
...
</SimpleAllele>
<TraitSet Type="Disease">
...
</TraitSet>
<SubmissionNameList>
...
</SubmissionNameList>
</ClinicalAssertion>
</ClinicalAssertionList>

JSON

{
"clinicalAssertions": [
{
"accession": "SCV002152762",
"pubMedIds": [
"28492532"
]
}
]
}

Known Issues

Known Issues

Entries with following missing/incorrect information are skipped

  1. Invalid Ref Allele (example VCV000437934)
  2. Invalid Alt Allele (example VCV000006637)
  3. Following variant types are not supported:
    1. Variation (example VCV000001101)
    2. fusion (example VCV000015269)
    3. unknown (example VCV000017564)
    4. protein only (example VCV000132152)
    5. Complex (example VCV000221337)
    6. Translocation (example VCV000267801)
    7. no_sequence_alteration (example VCV000010504)
  4. Only records of type classified are included [VCV with type included is skipped (example VCV000431749)]
  5. Records with missing genomic location are skipped (example VCV000000254)

Download URLs

https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarVCVRelease_00-latest.xml.gz

JSON Output

small variants:

{
"clinvar-preview": [
{
"altAllele": "A",
"refAllele": "G",
"variantType": "SNV",
"accession": "VCV000437934",
"version": "1",
"recordType": "classified",
"dateLastUpdated": "2023-08-06",
"rcvs": [
{
"accession": "RCV000505090",
"version": "1",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2016-08-31",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "Cleidocranial dysostosis",
"db": "MedGen",
"id": "C0008928"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2016-08-31",
"mostRecentSubmission": "2017-09-09",
"conditions": [
{
"type": "Disease",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "820",
"name": {
"xRefs": [
{
"db": "Genetic Alliance",
"id": "Cleidocranial+Dysplasia/1683"
},
{
"db": "SNOMED CT",
"id": "65976001"
}
],
"value": "Cleidocranial dysostosis"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000598565"
}
]
}
]
}

large variants:

{
"clinvar-preview": [
{
"chromosome": "17",
"begin": 150732,
"end": 14764202,
"variantType": "copy_number_gain",
"accession": "VCV000154089",
"version": "2",
"recordType": "classified",
"dateLastUpdated": "2023-10-15",
"rcvs": [
{
"accession": "RCV000142236",
"version": "6",
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"descriptions": [
{
"dateLastEvaluated": "2014-03-10",
"classification": "Pathogenic"
}
]
}
},
"classifiedConditions": [
{
"condition": "See cases"
}
]
}
],
"classifications": {
"germlineClassification": {
"reviewStatus": "no assertion criteria provided",
"classification": "Pathogenic",
"dateLastEvaluated": "2014-03-10",
"mostRecentSubmission": "2015-07-13",
"conditions": [
{
"type": "PhenotypeInstruction",
"contributesToAggregateClassification": true,
"traits": [
{
"id": "18728",
"name": {
"value": "See cases"
}
}
]
}
]
}
},
"clinicalAssertions": [
{
"accession": "SCV000183512"
}
]
}
]
}
FieldTypeNotes
chromosomestringChromosome
beginintegerstart position of variant
endintegerend of position of variant
refAllelestring
altAllelestring
accessionstringClinVar ID
versionstringClinVar version
variantTypestringvariant type
recordTypestringrecord type
dateLastUpdatedstringyyyy-MM-dd
rcvsarrayRCV objects associated to this VCV
classificationsarrayclassifications for this VCV
clinicalAssertionsarraySCV objects associated to this VCV
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

Variant Types

  • copy_number_gain
  • copy_number_loss
  • deletion
  • delins
  • duplication
  • insertion
  • inversion
  • SNV
  • tandem_duplication

Review Statuses

  • criteria provided, conflicting classifications
  • criteria provided, multiple submitters, no conflicts
  • criteria provided, single submitter
  • no assertion criteria provided
  • no classification provided
  • practice guideline
  • reviewed by expert panel

classification

  • Benign
  • Likely benign
  • Pathogenic
  • Uncertain significance
  • Likely pathogenic
  • Benign/Likely benign
  • not provided
  • conflicting data from submitters
  • Pathogenic/Likely pathogenic
  • association
  • Conflicting classifications of pathogenicity
  • Pathogenic; risk factor
  • risk factor
  • other
  • drug response
  • Uncertain significance; Pathogenic/Likely pathogenic
  • Likely pathogenic, low penetrance
  • Pathogenic; Affects
  • Pathogenic, low penetrance
  • protective
  • Affects
  • Benign; other
  • Conflicting classifications of pathogenicity; other
  • Conflicting classifications of pathogenicity; association
  • Uncertain risk allele
  • Uncertain significance; risk factor
  • Likely pathogenic; risk factor
  • Likely benign; association
  • Likely risk allele
  • Pathogenic/Likely pathogenic; other
  • Pathogenic; other
  • Pathogenic/Likely pathogenic/Pathogenic, low penetrance
  • Pathogenic/Likely pathogenic; risk factor
  • Benign/Likely benign; risk factor
  • Uncertain significance/Uncertain risk allele
  • Pathogenic; association; protective
  • protective; risk factor
  • Benign/Likely benign; other; risk factor
  • Benign/Likely benign; association
  • Benign; association
  • Affects; association; other
  • Pathogenic; protective
  • Conflicting classifications of pathogenicity; drug response; other
  • Conflicting classifications of pathogenicity; drug response
  • Benign; drug response
  • Likely pathogenic; other
  • Conflicting classifications of pathogenicity; protective
  • Pathogenic/Likely pathogenic; drug response
  • Benign/Likely benign; other
  • Likely pathogenic/Likely risk allele
  • Uncertain risk allele; protective
  • association not found
  • Affects; association
  • Uncertain significance; association
  • Likely benign; other
  • Uncertain significance; other
  • Conflicting classifications of pathogenicity; association; risk factor Pathogenic;
  • association
  • Benign; risk factor
  • Conflicting classifications of pathogenicity; other; risk factor
  • Pathogenic/Likely risk allele; risk factor
  • Uncertain significance; drug response
  • Conflicting classifications of pathogenicity; risk factor
  • other; risk factor
  • Pathogenic/Likely pathogenic/Likely risk allele
  • Likely pathogenic; drug response
  • Conflicting classifications of pathogenicity; Affects
  • association; drug response; risk factor
  • Pathogenic; drug response
  • Affects; risk factor
  • Pathogenic; drug response; other
  • Likely pathogenic; protective
  • confers sensitivity
  • Likely pathogenic; association
  • Benign; Affects
  • Likely pathogenic; Affects
  • Uncertain risk allele; risk factor
  • drug response; risk factor
  • Pathogenic/Likely risk allele
  • Likely benign; drug response; other
  • Benign/Likely benign; drug response
  • Benign/Likely benign; drug response; other
  • drug response; other
  • association; drug response
  • Pathogenic; confers sensitivity
  • association; risk factor
  • Pathogenic/Pathogenic, low penetrance; other
  • Benign; confers sensitivity
  • confers sensitivity; other
  • Likely pathogenic/Pathogenic, low penetrance
  • Likely benign; risk factor

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands clinvar. The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using clinvar subcommands and source data files

Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

ClinVarVCVRelease_00-latest.xml.gz
ClinVarVCVRelease_00-latest.xml.gz.version

The version file is a json file with the following format.

{
"name": "ClinVar",
"version": "20240501",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate": "2024-05-01"
}

You have to adjust the version and release date according to the actual date of the ClinVar.

Here is a sample execution:

dotnet SAUtils ClinVarPreview \
--r ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat\
--vcv ClinVarVCVRelease_00-latest.xml.gz\
--o output
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Parsing XML completed in 14.7 mins.
Sorting and adjusting completed in 4.7 mins.
Writing 2351609 Small Varaints
Chromosome 1 completed in 00:00:57.1
Chromosome 2 completed in 00:01:30.8
Chromosome 3 completed in 00:00:32.9
Chromosome 4 completed in 00:00:21.2
Chromosome 5 completed in 00:00:31.7
Chromosome 6 completed in 00:00:34.6
Chromosome 7 completed in 00:00:27.9
Chromosome 8 completed in 00:00:17.9
Chromosome 9 completed in 00:00:34.0
Chromosome 10 completed in 00:00:26.6
Chromosome 11 completed in 00:00:35.4
Chromosome 12 completed in 00:00:31.5
Chromosome 13 completed in 00:00:22.7
Chromosome 14 completed in 00:00:22.7
Chromosome 15 completed in 00:00:23.7
Chromosome 16 completed in 00:00:39.6
Chromosome 17 completed in 00:00:46.7
Chromosome 18 completed in 00:00:10.2
Chromosome 19 completed in 00:00:32.9
Chromosome 20 completed in 00:00:10.7
Chromosome 21 completed in 00:00:05.3
Chromosome 22 completed in 00:00:11.0
Chromosome X completed in 00:00:19.6
Chromosome Y completed in 00:00:00.1
Chromosome MT completed in 00:00:00.3
Maximum bp shifted for any variant:1
NSA writing completed in 11.5 mins.
Writing 76122 Large Varaints
Writing 76122 intervals to database...
NSI writing completed in 1.1 mins.

Time: 00:32:10.9
Process finished with exit code 0.


- - + + \ No newline at end of file diff --git a/data-sources/clinvar/index.html b/data-sources/clinvar/index.html index 303aab00..b39000e2 100644 --- a/data-sources/clinvar/index.html +++ b/data-sources/clinvar/index.html @@ -6,16 +6,16 @@ ClinVar | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

ClinVar

Overview

Deprecated

ClinVar has changed to a new XML format +

Version: 3.25 (unreleased)

ClinVar

Overview

Deprecated

ClinVar has changed to a new XML format Use CliVarPreview for latest ClinVar entries.

ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.

Publication

Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J Bradley Holmes, Brandi L Kattman, Donna R Maglott, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153

RCV File

Example

Here's a full RCV entry.

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

ID

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinVarAccession Acc="RCV000000001" Version="2">
</ClinVarSet>

The Acc and Version fields are merged to form the ID (RCV000000001.2)

LastUpdatedDate

<ClinVarSet>
<ReferenceClinVarAssertion DateCreated="2012-08-13" DateLastUpdated="2016-02-17" ID="57604" >
</ClinVarSet>

Significance

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

ReviewStatus

<ClinVarSet>
<ReferenceClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>
</ClinVarSet>

Phenotypes

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="62">
<Trait Type="Disease">
<Name>
<ElementValue Type="Preferred">Joubert syndrome 9</ElementValue>
</Name>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

We only use the field with Type="Preferred". Multiple phenotypes may be reported

Location, Variant Type and Variant Id

<ReferenceClinVarAssertion>
<GenotypeSet Type="CompoundHeterozygote" ID="424709">
<MeasureSet Type="Variant" ID="81">
<Measure Type="single nucleotide variant" ID="15120">
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38"
AssemblyStatus="current" Chr="10" Accession="NC_000010.11" start="89222510"
stop="89222510" display_start="89222510" display_stop="89222510" variantLength="1"
positionVCF="89222510" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25"
AssemblyStatus="previous" Chr="10" Accession="NC_000010.10" start="90982267"
stop="90982267" display_start="90982267" display_stop="90982267" variantLength="1"
positionVCF="90982267" referenceAlleleVCF="C" alternateAlleleVCF="T"/>
</Measure>
</MeasureSet>
</GenotypeSet>
</ReferenceClinVarAssertion>
  • The variant position is extracted from the fields for their respective assemblies.
  • Updated records contain positionVCF, referenceAlleleVCF and alternateAlleleVCF fields and when present, we use them to create the variant.
  • For older records, since "start' and "stop" fields are not always available, we use the "display_start" and "display_end" fields.
  • If a required allele is not available, we extract it from the reference sequence.
  • Only variants having a dbSNP id are extracted.
  • Note that a ClinVar accession may have multiple variants associated with it (possible in different locations)
  • VariantId is extracted from the MeasureSet attributes.
  • VariantType is extracted from the Measure attributes.
    unsupported variant types

    We currently don't support the following variant types:

    • Microsatellite
    • protein only
    • fusion
    • Complex
    • Variation
    • Translocation

MedGen, OMIM, Orphanet IDs

<ReferenceClinVarAssertion>
<TraitSet Type="Disease" ID="175">
<Trait ID="3036" Type="Disease">
<XRef ID="C0086651" DB="MedGen"/>
<XRef ID="309297" DB="Orphanet"/>
<XRef ID="582" DB="Orphanet"/>
<XRef Type="MIM" ID="253000" DB="OMIM"/>
</Trait>
</TraitSet>
</ReferenceClinVarAssertion>

AlleleOrigins

<ClinVarAssertion>
<Origin>germline</Origin>
</ClinVarAssertion>

We only extract all Allele Origins from Submissions (SCV) entries.

PubMedIds

<ClinVarAssertion>
<ClinicalSignificance DateLastEvaluated="1996-04-01">
<Citation Type="general">
<ID Source="PubMed">12114475</ID>
</Citation>
</ClinicalSignificance>
<AttributeSet>
<Attribute Type="AssertionMethod">LMM Criteria</Attribute>
<Citation>
<ID Source="PubMed">24033266</ID>
</Citation>
</AttributeSet>
<ObservedIn>
<ObservedData ID="9727445">
<Citation Type="general">
<ID Source="PubMed">9113933</ID>
</Citation>
</ObservedData>
</ObservedIn>
<Citation Type="general">
<ID Source="PubMed">23757202</ID>
</Citation>
</ClinVarAssertion>

We only extract all Pubmed Ids from Submissions (SCV) entries.

Parsing Significance

Extracting significance(s) may involve parsing multiple fields. Take the following snippets into consideration.

<ClinicalSignificance DateLastEvaluated="1996-04-01">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2016-10-13">
<ReviewStatus>criteria provided, multiple submitters, no conflicts</ReviewStatus>
<Description>Pathogenic/Likely pathogenic</Description>
</ClinicalSignificance>

<ClinicalSignificance DateLastEvaluated="2012-06-07">
<ReviewStatus>no assertion criteria provided</ReviewStatus>
<Description>Conflicting interpretations of pathogenicity</Description>
<Explanation DataSource="ClinVar" Type="public">Pathogenic(1);Uncertain significance(1)</Explanation>
</ClinicalSignificance>

Given the evidence, we converted the significance field into an array of strings which may be parsed out of the Descriptions or Explanation fields.

Varying Delimiters

The delimiters in each field may vary. Currently, the delimiters for Description are , and /. The delimiters for Explanation are ; and /.

VCV File

Example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ClinVarVariationRelease xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public/clinvar_variation/variation_archive_1.4.xsd" ReleaseDate="2019-12-31">
<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">
<RecordStatus>current</RecordStatus>
<Species>Homo sapiens</Species>
<IncludedRecord>
<SimpleAllele AlleleID="425239" VariationID="431749">
<GeneList>
<Gene Symbol="KCNAB2" FullName="potassium voltage-gated channel subfamily A regulatory beta subunit 2" GeneID="8514" HGNC_ID="HGNC:6229" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5992639" stop="6101186" display_start="5992639" display_stop="6101186" Strand="+"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6052357" stop="6161252" display_start="6052357" display_stop="6161252" Strand="+"/>
</Location>
<OMIM>601142</OMIM>
</Gene>
<Gene Symbol="NPHP4" FullName="nephrocystin 4" GeneID="261734" HGNC_ID="HGNC:19104" Source="calculated" RelationshipType="genes overlapped by variant">
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.38" AssemblyStatus="current" Chr="1" Accession="NC_000001.11" start="5862810" stop="5992425" display_start="5862810" display_stop="5992425" Strand="-"/>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="5922869" stop="6052532" display_start="5922869" display_stop="6052532" Strand="-"/>
</Location>
<OMIM>607215</OMIM>
</Gene>
</GeneList>
<Name>GRCh37/hg19 1p36.31(chr1:6051187-6158763)</Name>
<VariantType>copy number gain</VariantType>
<Location>
<CytogeneticLocation>1p36.31</CytogeneticLocation>
<SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" forDisplay="true" AssemblyStatus="previous" Chr="1" Accession="NC_000001.10" start="6051187" stop="6158763" display_start="6051187" display_stop="6158763"/> </Location>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<XRefList>
<XRef Type="Interpreted" ID="431733" DB="ClinVar"/>
</XRefList>
</SimpleAllele>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
<SubmittedInterpretationList>
<SCV Title="SUB1895145" Accession="SCV000296057" Version="1"/>
</SubmittedInterpretationList>
<InterpretedVariationList>
<InterpretedVariation VariationID="431733" Accession="VCV000431733" Version="1"/>
</InterpretedVariationList>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Parsing

In the following section, we discuss which field of the XML was used to extract information that is presented in the JSON output.

id

<VariationArchive VariationID="431749" VariationName="GRCh37/hg19 1p36.31(chr1:6051187-6158763)" VariationType="copy number gain" DateCreated="2017-08-12" DateLastUpdated="2019-09-10" Accession="VCV000431749" Version="1" RecordType="included" NumberOfSubmissions="0" NumberOfSubmitters="0">

The Acc and Version fields are merged to form the ID (RCV000000001.2)

significance

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<SimpleAllele>
<Interpretations>
<Interpretation NumberOfSubmissions="0" NumberOfSubmitters="0" Type="Clinical significance">
<Description>no interpretation for the single variant</Description>
</Interpretation>
</Interpretations>
</SimpleAllele>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

May have multiple significances listed.

reviewStatus

<ClinVarVariationRelease>
<VariationArchive>
<IncludedRecord>
<ReviewStatus>no interpretation for the single variant</ReviewStatus>
</IncludedRecord>
</VariationArchive>
</ClinVarVariationRelease>

Known Issues

Known Issues
  • The XML file contains ~1k more entries (out of 162K) than the VCF file
  • The XML file does not have a field indicating that a record is associated with the reference base - something that was present in VCF
  • The XML file contains entries (e.g. RCV000016645 version=1) which have IUPAC ambiguous bases ("R", "Y", "H", etc.) as their alternate allele

Download URLs

ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz

https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz

JSON Output

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands clinvar. The ClinVar .nsa and .nsi for Illumina Connected Annotations can be built using the SAUtils command's clinvar subcommand.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using clinvar subcommands and source data files

Two input .xml files and a .version file are required in order to build the .nsa and .nsi file. You should have the following files:

ClinVarFullRelease_00-latest.xml.gz     ClinVarVariationRelease_00-latest.xml.gz
ClinVarFullRelease_00-latest.xml.gz.version

The version file is a json file with the following format.

{
"name": "ClinVar",
"version": "20231230",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate": "2024-01-10"
}

You have to adjust the version and release date according to the actual date of the ClinVar.

The help menu for the utility is as follows:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2022 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll clinvar

Here is a sample execution:

dotnet SAUtils.dll clinvar \\
--ref ~/development/References/7/Homo_sapiens.GRCh38.Nirvana.dat --rcv ClinVarFullRelease_00-latest.xml.gz \\
--vcv ClinVarVariationRelease_00-latest.xml.gz --out ~/development/SupplementaryDatabase/63/GRCh38
---------------------------------------------------------------------------
SAUtils (c) 2022 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.18.1
---------------------------------------------------------------------------

Found 1535677 VCV records
Unknown vcv id:225946 found in RCV000211201.2
Unknown vcv id:225946 found in RCV000211253.2
Unknown vcv id:225946 found in RCV000211375.2
Unknown vcv id:976117 found in RCV001253316.1
Unknown vcv id:1321016 found in RCV001776995.2
3 unknown VCVs found in RCVs.
225946,976117,1321016
0 unknown VCVs found in RCVs.
Chromosome 1 completed in 00:00:15.1
Chromosome 2 completed in 00:00:20.0
Chromosome 3 completed in 00:00:09.7
Chromosome 4 completed in 00:00:05.9
Chromosome 5 completed in 00:00:09.8
Chromosome 6 completed in 00:00:08.3
Chromosome 7 completed in 00:00:08.7
Chromosome 8 completed in 00:00:06.2
Chromosome 9 completed in 00:00:08.6
Chromosome 10 completed in 00:00:07.0
Chromosome 11 completed in 00:00:11.7
Chromosome 12 completed in 00:00:08.0
Chromosome 13 completed in 00:00:06.3
Chromosome 14 completed in 00:00:06.0
Chromosome 15 completed in 00:00:06.6
Chromosome 16 completed in 00:00:10.8
Chromosome 17 completed in 00:00:13.8
Chromosome 18 completed in 00:00:02.9
Chromosome 19 completed in 00:00:08.7
Chromosome 20 completed in 00:00:03.6
Chromosome 21 completed in 00:00:02.4
Chromosome 22 completed in 00:00:03.6
Chromosome MT completed in 00:00:00.2
Chromosome X completed in 00:00:07.5
Chromosome Y completed in 00:00:00.0
Maximum bp shifted for any variant:2
Writing 37097 intervals to database...

Time: 00:13:26.9

- - + + \ No newline at end of file diff --git a/data-sources/cosmic-cancer-gene-census/index.html b/data-sources/cosmic-cancer-gene-census/index.html index 0076cacb..6fb9b041 100644 --- a/data-sources/cosmic-cancer-gene-census/index.html +++ b/data-sources/cosmic-cancer-gene-census/index.html @@ -6,13 +6,13 @@ cosmic-cancer-gene-census | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
- - +
Version: 3.25 (unreleased)

cosmic-cancer-gene-census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + \ No newline at end of file diff --git a/data-sources/cosmic-gene-fusion-json/index.html b/data-sources/cosmic-gene-fusion-json/index.html index 8603ff7f..ab532129 100644 --- a/data-sources/cosmic-gene-fusion-json/index.html +++ b/data-sources/cosmic-gene-fusion-json/index.html @@ -6,13 +6,13 @@ cosmic-gene-fusion-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.25 (unreleased)

cosmic-gene-fusion-json

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/data-sources/cosmic-json/index.html b/data-sources/cosmic-json/index.html index 9b79a8ed..09c7b8e1 100644 --- a/data-sources/cosmic-json/index.html +++ b/data-sources/cosmic-json/index.html @@ -6,13 +6,13 @@ cosmic-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
- - +
Version: 3.25 (unreleased)

cosmic-json

{
"id":"COSV58272668",
"numSamples":8,
"refAllele":"-",
"altAllele":"CCT",
"histologies":[
{
"name":"carcinoma (serous carcinoma)",
"numSamples":2
},
{
"name":"meningioma (fibroblastic)",
"numSamples":1
},
{
"name":"carcinoma",
"numSamples":1
},
{
"name":"carcinoma (squamous cell carcinoma)",
"numSamples":1
},
{
"name":"meningioma (transitional)",
"numSamples":1
},
{
"name":"carcinoma (adenocarcinoma)",
"numSamples":1
},
{
"name":"other (neoplasm)",
"numSamples":1
}
],
"sites":[
{
"name":"ovary",
"numSamples":2
},
{
"name":"meninges",
"numSamples":2
},
{
"name":"thyroid",
"numSamples":2
},
{
"name":"cervix",
"numSamples":1
},
{
"name":"large intestine (colon)",
"numSamples":1
}
],
"pubMedIds":[
25738363,
27548314
],
"confirmedSomatic":true,
"drugResistance":true, /* not in this particular COSMIC variant */
"isAlleleSpecific":true
}
FieldTypeNotes
idstringCOSMIC Genomic Mutation ID
numSamplesint
refAllelestring
altAllelestring
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs
confirmedSomaticbooltrue when the variant is a confirmed somatic variant
drugResistancebooltrue when the variant has been associated with drug resistance

Count

FieldTypeNotes
namestringdescription
numSamplesint
+ + \ No newline at end of file diff --git a/data-sources/cosmic/index.html b/data-sources/cosmic/index.html index efc6aa32..7b2e4090 100644 --- a/data-sources/cosmic/index.html +++ b/data-sources/cosmic/index.html @@ -6,12 +6,12 @@ COSMIC | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human +

Version: 3.25 (unreleased)

COSMIC

Overview

COSMIC, the Catalogue of Somatic Mutations in Cancer, is the world's largest source of expert manually curated somatic mutation information relating to human cancers.

Publication

John G Tate, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, Charlotte G Cole, Celestino Creatore, Elisabeth Dawson, Peter Fish, Bhavana Harsha, Charlie Hathaway, Steve C Jupe, Chai Yin Kok, Kate Noble, Laura Ponting, Christopher C Ramshaw, Claire E Rye, Helen E Speedy, Ray Stefancsik, Sam L Thompson, Shicai Wang, Sari Ward, Peter J Campbell, Simon A Forbes. (2019) COSMIC: the Catalogue Of Somatic Mutations In @@ -22,7 +22,7 @@ pair when it is released in the database. Currently COSMIC includes information on fusions involved in solid tumours and leukaemias.

TSV extraction

Example

SAMPLE_ID SAMPLE_NAME PRIMARY_SITE  SITE_SUBTYPE_1  SITE_SUBTYPE_2  SITE_SUBTYPE_3  PRIMARY_HISTOLOGY HISTOLOGY_SUBTYPE_1 HISTOLOGY_SUBTYPE_2 HISTOLOGY_SUBTYPE_3 FUSION_ID TRANSLOCATION_NAME  5'_CHROMOSOME 5'_STRAND 5'_GENE_ID  5'_GENE_NAME  5'_LAST_OBSERVED_EXON 5'_GENOME_START_FROM  5'_GENOME_START_TO  5'_GENOME_STOP_FROM 5'_GENOME_STOP_TO 3'_CHROMOSOME 3'_STRAND 3'_GENE_ID  3'_GENE_NAME  3'_FIRST_OBSERVED_EXON  3'_GENOME_START_FROM  3'_GENOME_START_TO  3'_GENOME_STOP_FROM 3'_GENOME_STOP_TO FUSION_TYPE PUBMED_PMID
749711 HCC1187 breast NS NS NS carcinoma ductal_carcinoma NS NS 665 ENST00000360863.10(RGS22):r.1_3555::ENST00000369518.1(SYCP1):r.2100_3452 8 - 197199 RGS22 22 99981937 99981937 100106116 100106116 1 + 212470 SYCP1_ENST00000369518 24 114944339 114944339 114995367 114995367 Inferred Breakpoint 20033038

Parsing

From the TSV file, we're mainly interested in the following columns:

  • SAMPLE_ID
  • PRIMARY_SITE
  • PRIMARY_HISTOLOGY
  • HISTOLOGY_SUBTYPE_1
  • FUSION_ID
  • TRANSLOCATION_NAME
  • PUBMED_PMID
info

For all the histologies and sites, we replace all the underlines with spaces. salivary_gland would become salivary gland.

Parsing

To create the gene fusion entries in Illumina Connected Annotations, we perform the following on each row in the TSV file:

  • Group all entries by FUSION_ID
  • Using all the entries related to this FUSION_ID:
    • Collect all the PubMed IDs
    • Tally the number of observed sample IDs
    • Grab the HGVS r. notation (should not change throughout the FUSION_ID)
    • Tally the number of samples observed for each histology
    • Tally the number of samples observed for each site
  • Extract the transcript IDs from the HGVS notation and lookup the associated gene symbols

Aggregating Histologies & Sites

Aggregating Histologies & Sites was previously described in the small variants section.

Known Issues

Known Issues

There are some issues with the HGVS RNA notation:

  • For coding transcripts, HGVS numbering should use CDS coordinates. Right now COSMIC is using cDNA coordinates for all their fusions.

Download URL

GRCh37

GRCh38

JSON Output

   "cosmicGeneFusions":[
{
"id":"COSF881",
"numSamples":6,
"geneSymbols":[
"MYB",
"NFIB"
],
"hgvsr":"ENST00000341911.5(MYB):r.1_2368::ENST00000397581.2(NFIB):r.2592_3318",
"histologies":[
{
"name":"adenoid cystic carcinoma",
"numSamples":6
}
],
"sites":[
{
"name":"salivary gland (submandibular)",
"numSamples":1
},
{
"name":"salivary gland (parotid)",
"numSamples":1
},
{
"name":"salivary gland (nasal cavity)",
"numSamples":1
},
{
"name":"breast",
"numSamples":3
}
],
"pubMedIds":[
19841262
]
}
]
FieldTypeNotes
idstringCOSMIC fusion ID
numSamplesint
geneSymbolsstring array5' gene & 3' gene
hgvsrstringHGVS RNA translocation fusion notation
histologiescount arrayphenotypic descriptions
sitescount arraytissue types
pubMedIdsint arrayPubMed IDs

Count

FieldTypeNotes
namestringdescription
numSamplesint

Cancer Gene Census

TSV Extraction

Example

GENE_NAME       CELL_TYPE       PUBMED_PMID     HALLMARK        IMPACT  DESCRIPTION     CELL_LINE
PRDM16 18496560 role in cancer oncogene oncogene
PRDM16 16015645 role in cancer fusion fusion

Parsing

To extract information about TSGs and oncogenes, the data based on the "role in cancer" attribute is filtered. For tumor suppressor genes, rows with the value "TSG" and for oncogenes, rows with the value "oncogene" are filtered. Some genes have both "TSG/oncogene" as their role, which indicates that they can act as both.

Columns

Only following columns are needed to gather required roles in cancer:

  • GENE_NAME
  • IMPACT
  • HALLMARK
Possible Roles in Cancer

The file contained following number of instances for each role type

Role in cancerTotal Instances
fusion149
TSG195
oncogene181
Total525

CSV Extraction

COSMIC Tiers are extracted from cancer_gene_census.csv file:

Gene Symbol,Name,Entrez GeneId,Genome Location,Tier,Hallmark,Chr Band,Somatic,Germline,Tumour Types(Somatic),Tumour Types(Germline),Cancer Syndrome,Tissue Type,Molecular Genetics,Role in Cancer,Mutation Types,Translocation Partner,Other Germline Mut,Other Syndrome,COSMIC ID,cosmic gene name,Synonyms
"AR","Androgen Receptor ","367","X:67544036-67730619","1","Yes","Xq12","yes","yes","prostate","","","E","Dom","oncogene","Mis","","yes ","Androgen insensitivity, Hypospadias 1, X-linked, Spinal and bulbar muscular atrophy of Kennedy ","COSG292497","AR","367,AIS,AR,DHTR,ENSG00000169083.16,HUMARA,NR3C4,P10275,SBMA,SMAX1"
"FH","fumarate hydratase","2271","1:241497603-241519761","1","","1q43","","yes","","leiomyomatosis, renal","hereditary leiomyomatosis and renal cell cancer","E, M","Rec","TSG","Mis, N, F","","","","COSG255037","FH","2271,ENSG00000091483.6,FH,P07954"
"ALK","anaplastic lymphoma kinase (Ki-1)","238","2:29192774-29921566","1","Yes","2p23.2","yes","yes","ALCL, NSCLC, neuroblastoma, inflammatory myofibroblastic tumour, Spitzoid tumour","neuroblastoma","familial neuroblastoma","L, E, M","Dom","oncogene, fusion","T, Mis, A","NPM1, TPM3, TFG, TPM4, ATIC, CLTC, MSN, RNF213, CARS, EML4, KIF5B, C2orf22, DCTN1, HIP1, TPR, RANBP2, PPFIBP1, SEC31A, STRN, VCL, C2orf44, KLC1","","","COSG383409","ALK","238,ALK,CD246,ENSG00000171094.17,Q9UM73"
"APC","adenomatous polyposis of the colon gene","324","5:112737888-112846239","1","Yes","5q22.2","yes","yes","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","colorectal, pancreatic, desmoid, hepatoblastoma, glioma, other CNS","adenomatous polyposis coli; Turcot syndrome","E, M, O","Rec","TSG","D, Mis, N, F, S","","","","COSG208824","APC","324,APC,DP2,DP2.5,DP3,ENSG00000134982.16,P25054,PPP1R46"
Columns

Only following columns are needed to gather required roles in cancer:

  • Gene Symbol
  • Tier

First the tiers are found from the CSV; based on gene symbols, the tiers' information is added while parsing through the TSV

Known Issues

None

Download URL

JSON output

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]

Building the supplementary files

You can generate COSMIC supplementary annotation files if you have COSMIC account credentials. Please refer to SAUtils section for more details.

- - + + \ No newline at end of file diff --git a/data-sources/dann-json/index.html b/data-sources/dann-json/index.html index a2bf7daa..b7a2bc59 100644 --- a/data-sources/dann-json/index.html +++ b/data-sources/dann-json/index.html @@ -6,13 +6,13 @@ dann-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - +
Version: 3.25 (unreleased)

dann-json

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/data-sources/dann/index.html b/data-sources/dann/index.html index e087dfad..df5b4077 100644 --- a/data-sources/dann/index.html +++ b/data-sources/dann/index.html @@ -6,16 +6,16 @@ DANN | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). +

Version: 3.25 (unreleased)

DANN

Overview

DANN uses the same feature set and training data as CADD (Combined Annotation-Dependent Depletion) to train a deep neural network (DNN). CADD is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. DANN improves on CADD (which uses Support Vector Machines (SVMs)) by capturing non-linear relationships by using a deep neural network instead of SVMs. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD’s SVM methodology.

Publication

Quang, Daniel, Yifei Chen, and Xiaohui Xie. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31.5 761-763 (2015). https://doi.org/10.1093/bioinformatics/btu703

TSV File

Example

chr     grch37_pos  ref     alt     DANN
1 10001 T A 0.16461391399220135
1 10001 T C 0.4396994049749739
1 10001 T G 0.38108629377072734
1 10002 A C 0.36182020272810128
1 10002 A G 0.44413258111779291
1 10002 A T 0.16812846819989813

Parsing

From the CSV file, we are interested in all columns:

  • chr
  • grch37_pos
  • ref
  • alt
  • DANN

GRCh38 liftover

The data is not available for GRCh38 on DANN website. We performed a liftover from GRCh37 to GRCh38 using crossmap.

Known Issues

None

Download URL

https://cbcl.ics.uci.edu/public_data/DANN/

JSON Output

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0
- - + + \ No newline at end of file diff --git a/data-sources/dbsnp-json/index.html b/data-sources/dbsnp-json/index.html index 787ebdf8..1f5ffe34 100644 --- a/data-sources/dbsnp-json/index.html +++ b/data-sources/dbsnp-json/index.html @@ -6,13 +6,13 @@ dbsnp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
- - +
Version: 3.25 (unreleased)

dbsnp-json

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs
+ + \ No newline at end of file diff --git a/data-sources/dbsnp/index.html b/data-sources/dbsnp/index.html index 9c75aef9..db32aeb9 100644 --- a/data-sources/dbsnp/index.html +++ b/data-sources/dbsnp/index.html @@ -6,13 +6,13 @@ dbSNP | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

Building the supplementary files

You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details.

- - +
Version: 3.25 (unreleased)

dbSNP

Overview

dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.

Publication

Sherry, S.T., Ward, M. and Sirotkin, K. (1999) dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res., 9, 677–679.

VCF File

Example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138; \
SSR=0;SAO=0;VP=0x050000020005130026000200;GENEINFO=DDX11L1:100287102;WGT=1; \
VC=DIV;R5;ASP;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1; \
TOPMED=0.76728147298674821,0.23271852701325178

Parsing

From the VCF file, we're mainly interested in the following:

  • rsID from the ID field
  • CAF from the INFO field

Global allele extraction

The global major and minor alleles are extracted based on the frequency of the alleles provided in the CAF field. The global minor allele frequency is the second highest value of the CAF comma delimited field (ignoring '.' values).

Tie Breaking: Global Major Allele

If there are two candidates for global major and the reference allele is one of them, we prefer the reference allele.

Tie Breaking: Global Minor Allele

If there are two candidates for global minor and the reference allele is one of them, we prefer the other allele. If the reference allele is not involved, they are chosen arbitrarily.

Equal Allele Frequency Example (2 alleles)

chr1    100 A   C   CAF=0.5,0.5

We will select A to be the global major allele and C to be the global minor allele.

Equal Allele Frequency Example (3 alleles)

chr1    100 A   C,T CAF=0.33,0.33,0.33

We will select A to be the global major allele and either C or T is chosen (arbitrarily) to be the global minor allele.

Equal Allele Frequency in Alternate Alleles

chr1    100 A   C,T CAF=0.2,0.4,0.4

We will select C or T to be arbitrarily assigned to be the global major or global minor allele.

Equal Allele Frequency Between Reference & Alternate Allele

chr1    100 A   C,T CAF=0.2,0.2,0.6

We will select T to be the global major allele and C to be the global minor allele.

Known Issues

Known Issues

If there are multiple entries with different CAF values for the same allele, we use the first CAF value.

Download URL

https://ftp.ncbi.nih.gov/snp/organisms/

JSON Output

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

Building the supplementary files

You can generate dbSNP supplementary annotation files by yourself. Please refer to SAUtils section for more details.

+ + \ No newline at end of file diff --git a/data-sources/decipher-json/index.html b/data-sources/decipher-json/index.html index 991b4cd2..e932eba0 100644 --- a/data-sources/decipher-json/index.html +++ b/data-sources/decipher-json/index.html @@ -6,13 +6,13 @@ decipher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - +
Version: 3.25 (unreleased)

decipher-json

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
+ + \ No newline at end of file diff --git a/data-sources/decipher/index.html b/data-sources/decipher/index.html index d94b88c8..96adf146 100644 --- a/data-sources/decipher/index.html +++ b/data-sources/decipher/index.html @@ -6,14 +6,14 @@ DECIPHER | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz +

Version: 3.25 (unreleased)

DECIPHER

Overview

DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants.

DECIPHER enhances clinical diagnosis by retrieving information from a variety of bioinformatics resources relevant to the variant found in the patient. The patient's variant is displayed in the context of both normal variation and pathogenic variation reported at that locus thereby facilitating interpretation.

Publication

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al., 2009. Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

TSV Extraction

#population_cnv_id  chr start   end deletion_observations   deletion_frequency  deletion_standard_error duplication_observations    duplication_frequency   duplication_standard_error  observations    frequency   standard_error  type    sample_size study
1 1 10529 177368 0 0 1 3 0.075 0.555277708 3 0.075 0.555277708 1 40 42M calls
2 1 13516 91073 0 0 1 27 0.675 0.109713431 27 0.675 0.109713431 1 40 42M calls
3 1 18888 35451 0 0 1 2 0.002366864 0.706269473 2 0.002366864 0.706269473 1 845 DDD

Parsing

We parse the DECIPHER tsv file and extract the following columns:

  • chr
  • start
  • end
  • deletion_observations
  • deletion_frequency
  • duplication_observations
  • duplication_frequency
  • sample_size

Download URL

https://www.deciphergenomics.org/files/downloads/population_cnv_grch38.txt.gz https://www.deciphergenomics.org/files/downloads/population_cnv_grch37.txt.gz

JSON output

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap
- - + + \ No newline at end of file diff --git a/data-sources/fusioncatcher-json/index.html b/data-sources/fusioncatcher-json/index.html index 354976ce..92415457 100644 --- a/data-sources/fusioncatcher-json/index.html +++ b/data-sources/fusioncatcher-json/index.html @@ -6,13 +6,13 @@ fusioncatcher-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.25 (unreleased)

fusioncatcher-json

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/data-sources/fusioncatcher/index.html b/data-sources/fusioncatcher/index.html index cfdfd6b9..7bc0e370 100644 --- a/data-sources/fusioncatcher/index.html +++ b/data-sources/fusioncatcher/index.html @@ -6,13 +6,13 @@ FusionCatcher | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
- - +
Version: 3.25 (unreleased)

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene
+ + \ No newline at end of file diff --git a/data-sources/gerp-json/index.html b/data-sources/gerp-json/index.html index 16a7e51e..29cb5709 100644 --- a/data-sources/gerp-json/index.html +++ b/data-sources/gerp-json/index.html @@ -6,13 +6,13 @@ gerp-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - +
Version: 3.25 (unreleased)

gerp-json

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
+ + \ No newline at end of file diff --git a/data-sources/gerp/index.html b/data-sources/gerp/index.html index b8fb9c7e..790e1a03 100644 --- a/data-sources/gerp/index.html +++ b/data-sources/gerp/index.html @@ -6,15 +6,15 @@ GERP | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. +

Version: 3.25 (unreleased)

GERP

Overview

GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint (Rejected Substitutions). Illumina Connected Annotations uses GERP++ which is based on a significantly faster and more statistically robust maximum likelihood estimation procedure to compute expected rates of evolution.

Publication

Davydov, Eugene V., et al. "Identifying a high fraction of the human genome to be under selective constraint using GERP++." PLoS computational biology 6.12 e1001025 (2010). https://doi.org/10.1371/journal.pcbi.1001025

Source Files

Example GRCh37

GRCh37 file is a TSV format

chr     position    GERP
1 12177 0.83
1 12178 -0.206
1 12179 -0.492
1 12180 -1.66
1 12181 0.83
1 12182 0.83
1 12183 -0.417
1 12184 0.83

Example GRCh38

GRCh38 file is a lift-over BED format

chr     pos_start   pos_end     GERP
1 12646 12647 0.298
1 12647 12648 2.63
1 12648 12649 1.87
1 12649 12650 0.252
1 12650 12651 -2.06
1 12651 12652 2.61
1 12652 12653 3.97

Parsing

From the CSV file, we are interested in columns:

  • chr
  • position
  • GERP

Known Issues

None

Download URL

GRCh37

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html

GRCh38

The data is not available for GRCh38 on GERP++ website, and was obtained from https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/

JSON Output

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞
- - + + \ No newline at end of file diff --git a/data-sources/gme-json/index.html b/data-sources/gme-json/index.html index 1fa5b637..ed5817e7 100644 --- a/data-sources/gme-json/index.html +++ b/data-sources/gme-json/index.html @@ -6,13 +6,13 @@ gme-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.25 (unreleased)

gme-json

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/data-sources/gme/index.html b/data-sources/gme/index.html index f3d2a479..ec0edd05 100644 --- a/data-sources/gme/index.html +++ b/data-sources/gme/index.html @@ -6,13 +6,13 @@ GME Variome | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.25 (unreleased)

GME Variome

Overview

The Greater Middle East (GME) Variome Project is aimed at generating a coding base reference for the countries found in the Greater Middle East. Illumina Connected Annotations presents variant frequencies for the Greater Middle Eastern population.

Publication

Scott, E. M., Halees, A., Itan, Y., Spencer, E. G., He, Y., Azab, M. A., Gabriel, S. B., Belkadi, A., Boisson, B., Abel, L., Clark, A. G., Greater Middle East Variome Consortium, Alkuraya, F. S., Casanova, J. L., & Gleeson, J. G. (2016). Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nature genetics, 48(9), 1071–1076. https://doi.org/10.1038/ng.3592

TSV Extraction

chrom   pos     ref     alt     AA      filter  FunctionGVS     geneFunction    Gene    GeneID  SIFT_pred       GERP++  AF      GME_GC  GME_AC  GME_AF  NWA     NEA     AP      Israel  SD      TP      CA      FunctionGVS_new Priority        Polyphen2_HVAR_pred     LRT_pred        MutationTaster_pred     rsid    OMIM_MIM        OMIM_Disease    AA_AC   EA_AC   rsid_link       position_link
1 69134 A G A VQSRTrancheSNP99.90to100.00 nonsynonymous_SNV exonic OR4F5 79501 T 2.31 96:0:5 10,192 0.04950495049504951 4:0:0 59:0:2 12:0:0 0:0:0 6:0:0 9:0:2 13:0:2 nonsynonymous_SNV MODERATE B N N none - - none none - http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69134-69133
1 69270 A G A PASS synonymous_SNV exonic OR4F5 79501 . . 93:38:240 518,224 0.6981132075471698 5:5:11 63:30:86 12:5:28 1:0:2 2:2:18 7:3:46 7:2:52 synonymous_SNV LOW . . . rs201219564 - - none none http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs201219564 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69270-69269
1 69428 T G T PASS nonsynonymous_SNV exonic OR4F5 79501 D 0.891 676:44:15 74,1396 0.050340136054421766 43:0:2 313:16:10 88:7:3 6:0:0 44:8:0 102:9:0 102:4:2 nonsynonymous_SNV MODERATE D N N rs140739101 - - 14,3808 313,6535 http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs140739101 http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr1%3A69428-69427

Parsing

We parse the GME tsv file and extract the following columns:

  • chrom
  • pos
  • ref
  • alt
  • filter
  • GME_AC
  • GME_AF

GRCh37 liftover

The data is not available for GRCh38 on GME website. We performed a liftover from GRCh37 to GRCh38 using CrossMap.

Download URL

http://igm.ucsd.edu/gme/download.shtml

JSON output

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/data-sources/gnomad-lof-json/index.html b/data-sources/gnomad-lof-json/index.html index 1ca90b6a..f0ff759b 100644 --- a/data-sources/gnomad-lof-json/index.html +++ b/data-sources/gnomad-lof-json/index.html @@ -6,13 +6,13 @@ gnomad-lof-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
- - +
Version: 3.25 (unreleased)

gnomad-lof-json

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)
+ + \ No newline at end of file diff --git a/data-sources/gnomad-small-variants-json/index.html b/data-sources/gnomad-small-variants-json/index.html index 2df521fc..4622962d 100644 --- a/data-sources/gnomad-small-variants-json/index.html +++ b/data-sources/gnomad-small-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
- - +
Version: 3.25 (unreleased)

gnomad-small-variants-json

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.
+ + \ No newline at end of file diff --git a/data-sources/gnomad-structural-variants-data_description/index.html b/data-sources/gnomad-structural-variants-data_description/index.html index 5ee55c50..7451c230 100644 --- a/data-sources/gnomad-structural-variants-data_description/index.html +++ b/data-sources/gnomad-structural-variants-data_description/index.html @@ -6,14 +6,14 @@ gnomad-structural-variants-data_description | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. +

Version: 3.25 (unreleased)

gnomad-structural-variants-data_description

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type Key
copy_number_variation
deletionDEL, CN=0
duplicationDUP
insertionINS
inversionINV
mobile_element_insertionINS:ME
mobile_element_insertionINS:ME:ALU
mobile_element_insertionINS:ME:LINE1
mobile_element_insertionINS:ME:SVA
structural alteration
complex_structural_alterationCPX
- - + + \ No newline at end of file diff --git a/data-sources/gnomad-structural-variants-json/index.html b/data-sources/gnomad-structural-variants-json/index.html index b2d2bdb2..a4b22a75 100644 --- a/data-sources/gnomad-structural-variants-json/index.html +++ b/data-sources/gnomad-structural-variants-json/index.html @@ -6,13 +6,13 @@ gnomad-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - +
Version: 3.25 (unreleased)

gnomad-structural-variants-json

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
+ + \ No newline at end of file diff --git a/data-sources/gnomad/index.html b/data-sources/gnomad/index.html index 65069751..4da8cc3a 100644 --- a/data-sources/gnomad/index.html +++ b/data-sources/gnomad/index.html @@ -6,12 +6,12 @@ gnomAD | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Illumina Connected Analysis will support gnomAD v4.0 for GRCh38 assembly and gnomAD v2.1 for GRCh37.

gnomAD v4.0 (GRCh38)

Small Variants

In gnomAD v4.0, like gnomAD v2.1, there are genome and exome data. Compare to gnomAD v2.1 which the data for genome and exome are merged, for gnomAD 4.0, Illumina Connected Annotation will separate them with different JSON output field. +

Version: 3.25 (unreleased)

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Illumina Connected Analysis will support gnomAD v4.0 for GRCh38 assembly and gnomAD v2.1 for GRCh37.

gnomAD v4.0 (GRCh38)

Small Variants

In gnomAD v4.0, like gnomAD v2.1, there are genome and exome data. Compare to gnomAD v2.1 which the data for genome and exome are merged, for gnomAD 4.0, Illumina Connected Annotation will separate them with different JSON output field. For gnomAD genome, the field name would be gnomad. For gnomAD exome, the field name would be gnomad-exome. Despite this difference in the field name, the JSON data format would be identical for both genome and exome.

VCF extraction

We currently extract the following info fields from both gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles">
##INFO=<ID=AC_XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples">
##INFO=<ID=AN_XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples">
##INFO=<ID=nhomalt_XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples">
##INFO=<ID=AC_XY,Number=A,Type=Integer,Description="Alternate allele count for XY samples">
##INFO=<ID=AN_XY,Number=1,Type=Integer,Description="Total number of alleles in XY samples">
##INFO=<ID=nhomalt_XY,Number=A,Type=Integer,Description="Count of homozygous individuals in XY samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African/African-American ancestry">
##INFO=<ID=AN_afr,Number=1,Type=Integer,Description="Total number of alleles in samples of African/African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African/African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=1,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=1,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=1,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=1,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_mid,Number=A,Type=Integer,Description="Alternate allele count for samples of Middle Eastern ancestry">
##INFO=<ID=AN_mid,Number=1,Type=Integer,Description="Total number of alleles in samples of Middle Eastern ancestry">
##INFO=<ID=nhomalt_mid,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Middle Eastern ancestry">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of Non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=1,Type=Integer,Description="Total number of alleles in samples of Non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Non-Finnish European ancestry">
##INFO=<ID=AC_remaining,Number=A,Type=Integer,Description="Alternate allele count for samples of Remaining individuals ancestry">
##INFO=<ID=AN_remaining,Number=1,Type=Integer,Description="Total number of alleles in samples of Remaining individuals ancestry">
##INFO=<ID=nhomalt_remaining,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Remaining individuals ancestry">
##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=1,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">

JSON output

"gnomad": {
"coverage": 154,
"failedFilter": true,
"allAf": 0.5,
"allAn": 152428,
"allAc": 76214,
"allHc": 0,
"afrAf": 0.5,
"afrAn": 41608,
"afrAc": 20804,
"afrHc": 0,
"amiAf": 0.5,
"amiAn": 912,
"amiAc": 456,
"amiHc": 0,
"amrAf": 0.5,
"amrAn": 15314,
"amrAc": 7657,
"amrHc": 0,
"easAf": 0.5,
"easAn": 5196,
"easAc": 2598,
"easHc": 0,
"finAf": 0.5,
"finAn": 10632,
"finAc": 5316,
"finHc": 0,
"nfeAf": 0.5,
"nfeAn": 68050,
"nfeAc": 34025,
"nfeHc": 0,
"asjAf": 0.5,
"asjAn": 3472,
"asjAc": 1736,
"asjHc": 0,
"sasAf": 0.5,
"sasAn": 4834,
"sasAc": 2417,
"sasHc": 0,
"midAf": 0.5,
"midAn": 294,
"midAc": 147,
"midHc": 0,
"remainingAf": 0.5,
"remainingAn": 2116,
"remainingAc": 1058,
"remainingHc": 0,
"maleAf": 0.5,
"maleAn": 74544,
"maleAc": 37272,
"maleHc": 0,
"femaleAf": 0.5,
"femaleAn": 77884,
"femaleAc": 38942,
"femaleHc": 0
}
"gnomad-exome": {
"coverage": 53,
"allAf": 0.495074,
"allAn": 4060,
"allAc": 2010,
"allHc": 11,
"afrAf": 0.5,
"afrAn": 86,
"afrAc": 43,
"afrHc": 0,
"amrAf": 0.5,
"amrAn": 46,
"amrAc": 23,
"amrHc": 0,
"easAf": 0.491071,
"easAn": 112,
"easAc": 55,
"easHc": 0,
"finAf": 0.5,
"finAn": 306,
"finAc": 153,
"finHc": 0,
"nfeAf": 0.49503,
"nfeAn": 3018,
"nfeAc": 1494,
"nfeHc": 11,
"asjAf": 0.461538,
"asjAn": 26,
"asjAc": 12,
"asjHc": 0,
"sasAf": 0.486111,
"sasAn": 72,
"sasAc": 35,
"sasHc": 0,
"midAf": 0.5,
"midAn": 68,
"midAc": 34,
"midHc": 0,
"remainingAf": 0.493865,
"remainingAn": 326,
"remainingAc": 161,
"remainingHc": 0,
"maleAf": 0.495212,
"maleAn": 2924,
"maleAc": 1448,
"maleHc": 9,
"femaleAf": 0.494718,
"femaleAn": 1136,
"femaleAc": 562,
"femaleHc": 2
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
maleAffloatallele frequency for male population. Range: 0 - 1.0
maleAnintallele number for male population. Non-zero integer.
maleAcintallele count for male population. Integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleAffloatallele frequency for female population. Range: 0 - 1.0
femaleAnintallele number for female population. Non-zero integer.
femaleAcintallele count for female population. Integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
remainingAffloatallele frequency for the Other population. Range: 0 - 1.0
remainingAcintallele count for the Other population. Integer.
remainingAnintallele number for the Other population. Non-zero integer.
remainingHcintcount of homozygous individuals for Other population. Non-negative integer
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAnintallele number for all populations. Non-zero integer.
allAcintallele count for all populations. Integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amiAffloatallele frequency for Amish populations. Range: 0 - 1.0
amiAnintallele number for Amish populations. Non-zero integer.
amiAcintallele count for Amish populations. Integer.
amiHcintcount of homozygous individuals for Amish populations. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
midAffloatallele frequency for the Middle Eastern population. Range: 0 - 1.0
midAcintallele count for the iddle Eastern population Integer.
midAnintallele number for the iddle Eastern population. Non-zero integer.
midHcintcount of homozygous individuals for the iddle Eastern population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)

Calculation

To calculate allele frequency for each group, we divide the allele count with allele number for each group.

LoF Gene Metrics

In gnomAD 4.0, the gene score data for LOF is given per transcript. Since this is gene level data, one of the transcripts need to be chosen and value reported. @@ -27,7 +27,7 @@ Currently, the annotations do not include translocation breakends. Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type Key
copy_number_variation
deletionDEL, CN=0
duplicationDUP
insertionINS
inversionINV
mobile_element_insertionINS:ME
mobile_element_insertionINS:ME:ALU
mobile_element_insertionINS:ME:LINE1
mobile_element_insertionINS:ME:SVA
structural alteration
complex_structural_alterationCPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

JSON output

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter
- - + + \ No newline at end of file diff --git a/data-sources/gnomad4.0-lof-json/index.html b/data-sources/gnomad4.0-lof-json/index.html index 03db7270..aa13e0aa 100644 --- a/data-sources/gnomad4.0-lof-json/index.html +++ b/data-sources/gnomad4.0-lof-json/index.html @@ -6,13 +6,13 @@ gnomad4.0-lof-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad4.0-lof-json

"gnomAD": {
"pLi": 0.00000122,
"pRec": 0.32,
"pNull": 0.68,
"synZ": 0.0117,
"misZ": 0.162,
"loeuf": 1.94,
"transcriptId": "ENST00000360525"
}
- - +
Version: 3.25 (unreleased)

gnomad4.0-lof-json

"gnomAD": {
"pLi": 0.00000122,
"pRec": 0.32,
"pNull": 0.68,
"synZ": 0.0117,
"misZ": 0.162,
"loeuf": 1.94,
"transcriptId": "ENST00000360525"
}
+ + \ No newline at end of file diff --git a/data-sources/gnomad4.0-small-variants-json/index.html b/data-sources/gnomad4.0-small-variants-json/index.html index 3650eb67..0fafd9f2 100644 --- a/data-sources/gnomad4.0-small-variants-json/index.html +++ b/data-sources/gnomad4.0-small-variants-json/index.html @@ -6,13 +6,13 @@ gnomad4.0-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad4.0-small-variants-json

"gnomad": {
"coverage": 154,
"failedFilter": true,
"allAf": 0.5,
"allAn": 152428,
"allAc": 76214,
"allHc": 0,
"afrAf": 0.5,
"afrAn": 41608,
"afrAc": 20804,
"afrHc": 0,
"amiAf": 0.5,
"amiAn": 912,
"amiAc": 456,
"amiHc": 0,
"amrAf": 0.5,
"amrAn": 15314,
"amrAc": 7657,
"amrHc": 0,
"easAf": 0.5,
"easAn": 5196,
"easAc": 2598,
"easHc": 0,
"finAf": 0.5,
"finAn": 10632,
"finAc": 5316,
"finHc": 0,
"nfeAf": 0.5,
"nfeAn": 68050,
"nfeAc": 34025,
"nfeHc": 0,
"asjAf": 0.5,
"asjAn": 3472,
"asjAc": 1736,
"asjHc": 0,
"sasAf": 0.5,
"sasAn": 4834,
"sasAc": 2417,
"sasHc": 0,
"midAf": 0.5,
"midAn": 294,
"midAc": 147,
"midHc": 0,
"remainingAf": 0.5,
"remainingAn": 2116,
"remainingAc": 1058,
"remainingHc": 0,
"maleAf": 0.5,
"maleAn": 74544,
"maleAc": 37272,
"maleHc": 0,
"femaleAf": 0.5,
"femaleAn": 77884,
"femaleAc": 38942,
"femaleHc": 0
}
"gnomad-exome": {
"coverage": 53,
"allAf": 0.495074,
"allAn": 4060,
"allAc": 2010,
"allHc": 11,
"afrAf": 0.5,
"afrAn": 86,
"afrAc": 43,
"afrHc": 0,
"amrAf": 0.5,
"amrAn": 46,
"amrAc": 23,
"amrHc": 0,
"easAf": 0.491071,
"easAn": 112,
"easAc": 55,
"easHc": 0,
"finAf": 0.5,
"finAn": 306,
"finAc": 153,
"finHc": 0,
"nfeAf": 0.49503,
"nfeAn": 3018,
"nfeAc": 1494,
"nfeHc": 11,
"asjAf": 0.461538,
"asjAn": 26,
"asjAc": 12,
"asjHc": 0,
"sasAf": 0.486111,
"sasAn": 72,
"sasAc": 35,
"sasHc": 0,
"midAf": 0.5,
"midAn": 68,
"midAc": 34,
"midHc": 0,
"remainingAf": 0.493865,
"remainingAn": 326,
"remainingAc": 161,
"remainingHc": 0,
"maleAf": 0.495212,
"maleAn": 2924,
"maleAc": 1448,
"maleHc": 9,
"femaleAf": 0.494718,
"femaleAn": 1136,
"femaleAc": 562,
"femaleHc": 2
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
maleAffloatallele frequency for male population. Range: 0 - 1.0
maleAnintallele number for male population. Non-zero integer.
maleAcintallele count for male population. Integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleAffloatallele frequency for female population. Range: 0 - 1.0
femaleAnintallele number for female population. Non-zero integer.
femaleAcintallele count for female population. Integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
remainingAffloatallele frequency for the Other population. Range: 0 - 1.0
remainingAcintallele count for the Other population. Integer.
remainingAnintallele number for the Other population. Non-zero integer.
remainingHcintcount of homozygous individuals for Other population. Non-negative integer
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAnintallele number for all populations. Non-zero integer.
allAcintallele count for all populations. Integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amiAffloatallele frequency for Amish populations. Range: 0 - 1.0
amiAnintallele number for Amish populations. Non-zero integer.
amiAcintallele count for Amish populations. Integer.
amiHcintcount of homozygous individuals for Amish populations. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
midAffloatallele frequency for the Middle Eastern population. Range: 0 - 1.0
midAcintallele count for the iddle Eastern population Integer.
midAnintallele number for the iddle Eastern population. Non-zero integer.
midHcintcount of homozygous individuals for the iddle Eastern population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
- - +
Version: 3.25 (unreleased)

gnomad4.0-small-variants-json

"gnomad": {
"coverage": 154,
"failedFilter": true,
"allAf": 0.5,
"allAn": 152428,
"allAc": 76214,
"allHc": 0,
"afrAf": 0.5,
"afrAn": 41608,
"afrAc": 20804,
"afrHc": 0,
"amiAf": 0.5,
"amiAn": 912,
"amiAc": 456,
"amiHc": 0,
"amrAf": 0.5,
"amrAn": 15314,
"amrAc": 7657,
"amrHc": 0,
"easAf": 0.5,
"easAn": 5196,
"easAc": 2598,
"easHc": 0,
"finAf": 0.5,
"finAn": 10632,
"finAc": 5316,
"finHc": 0,
"nfeAf": 0.5,
"nfeAn": 68050,
"nfeAc": 34025,
"nfeHc": 0,
"asjAf": 0.5,
"asjAn": 3472,
"asjAc": 1736,
"asjHc": 0,
"sasAf": 0.5,
"sasAn": 4834,
"sasAc": 2417,
"sasHc": 0,
"midAf": 0.5,
"midAn": 294,
"midAc": 147,
"midHc": 0,
"remainingAf": 0.5,
"remainingAn": 2116,
"remainingAc": 1058,
"remainingHc": 0,
"maleAf": 0.5,
"maleAn": 74544,
"maleAc": 37272,
"maleHc": 0,
"femaleAf": 0.5,
"femaleAn": 77884,
"femaleAc": 38942,
"femaleHc": 0
}
"gnomad-exome": {
"coverage": 53,
"allAf": 0.495074,
"allAn": 4060,
"allAc": 2010,
"allHc": 11,
"afrAf": 0.5,
"afrAn": 86,
"afrAc": 43,
"afrHc": 0,
"amrAf": 0.5,
"amrAn": 46,
"amrAc": 23,
"amrHc": 0,
"easAf": 0.491071,
"easAn": 112,
"easAc": 55,
"easHc": 0,
"finAf": 0.5,
"finAn": 306,
"finAc": 153,
"finHc": 0,
"nfeAf": 0.49503,
"nfeAn": 3018,
"nfeAc": 1494,
"nfeHc": 11,
"asjAf": 0.461538,
"asjAn": 26,
"asjAc": 12,
"asjHc": 0,
"sasAf": 0.486111,
"sasAn": 72,
"sasAc": 35,
"sasHc": 0,
"midAf": 0.5,
"midAn": 68,
"midAc": 34,
"midHc": 0,
"remainingAf": 0.493865,
"remainingAn": 326,
"remainingAc": 161,
"remainingHc": 0,
"maleAf": 0.495212,
"maleAn": 2924,
"maleAc": 1448,
"maleHc": 9,
"femaleAf": 0.494718,
"femaleAn": 1136,
"femaleAc": 562,
"femaleHc": 2
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
maleAffloatallele frequency for male population. Range: 0 - 1.0
maleAnintallele number for male population. Non-zero integer.
maleAcintallele count for male population. Integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleAffloatallele frequency for female population. Range: 0 - 1.0
femaleAnintallele number for female population. Non-zero integer.
femaleAcintallele count for female population. Integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
remainingAffloatallele frequency for the Other population. Range: 0 - 1.0
remainingAcintallele count for the Other population. Integer.
remainingAnintallele number for the Other population. Non-zero integer.
remainingHcintcount of homozygous individuals for Other population. Non-negative integer
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAnintallele number for all populations. Non-zero integer.
allAcintallele count for all populations. Integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amiAffloatallele frequency for Amish populations. Range: 0 - 1.0
amiAnintallele number for Amish populations. Non-zero integer.
amiAcintallele count for Amish populations. Integer.
amiHcintcount of homozygous individuals for Amish populations. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
midAffloatallele frequency for the Middle Eastern population. Range: 0 - 1.0
midAcintallele count for the iddle Eastern population Integer.
midAnintallele number for the iddle Eastern population. Non-zero integer.
midHcintcount of homozygous individuals for the iddle Eastern population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
+ + \ No newline at end of file diff --git a/data-sources/gnomad40-structural-variants-json/index.html b/data-sources/gnomad40-structural-variants-json/index.html index adc7af3c..eac4b315 100644 --- a/data-sources/gnomad40-structural-variants-json/index.html +++ b/data-sources/gnomad40-structural-variants-json/index.html @@ -6,13 +6,13 @@ gnomad40-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

gnomad40-structural-variants-json

"gnomad": [
{
"chromosome": "1",
"begin": 1769047,
"end": 78686496,
"variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",
"variantType": "complex_structural_alteration",
"failedFilter": true,
"allAf": 0.51192,
"afrAf": 0.491986,
"amiAf": 0.559382,
"amrAf": 0.499444,
"asjAf": 0.505975,
"easAf": 0.51924,
"midAf": 0.53125,
"finAf": 0.542619,
"nfeAf": 0.521916,
"othAf": 0.492366,
"sasAf": 0.516568,
"femaleAf": 0.509225,
"maleAf": 0.514861,
"allAc": 64549,
"afrAc": 16637,
"amiAc": 471,
"amrAc": 6290,
"asjAc": 1609,
"easAc": 2105,
"midAc": 34,
"finAc": 3514,
"nfeAc": 30839,
"othAc": 774,
"sasAc": 2276,
"femaleAc": 33507,
"maleAc": 31042,
"allAn": 126092,
"afrAn": 33816,
"amiAn": 842,
"amrAn": 12594,
"asjAn": 3180,
"easAn": 4054,
"midAn": 64,
"finAn": 6476,
"nfeAn": 59088,
"othAn": 1572,
"sasAn": 4406,
"femaleAn": 65800,
"maleAn": 60292,
"allHc": 3167,
"afrHc": 413,
"amiHc": 54,
"amrHc": 238,
"asjHc": 49,
"easHc": 97,
"midHc": 2,
"finHc": 368,
"nfeHc": 1807,
"othHc": 23,
"sasHc": 116,
"femaleHc": 1407,
"maleHc": 1760
}
]
- - +
Version: 3.25 (unreleased)

gnomad40-structural-variants-json

"gnomad": [
{
"chromosome": "1",
"begin": 1769047,
"end": 78686496,
"variantId": "gnomAD-SV_v3_CPX_chr1_4787cfba",
"variantType": "complex_structural_alteration",
"failedFilter": true,
"allAf": 0.51192,
"afrAf": 0.491986,
"amiAf": 0.559382,
"amrAf": 0.499444,
"asjAf": 0.505975,
"easAf": 0.51924,
"midAf": 0.53125,
"finAf": 0.542619,
"nfeAf": 0.521916,
"othAf": 0.492366,
"sasAf": 0.516568,
"femaleAf": 0.509225,
"maleAf": 0.514861,
"allAc": 64549,
"afrAc": 16637,
"amiAc": 471,
"amrAc": 6290,
"asjAc": 1609,
"easAc": 2105,
"midAc": 34,
"finAc": 3514,
"nfeAc": 30839,
"othAc": 774,
"sasAc": 2276,
"femaleAc": 33507,
"maleAc": 31042,
"allAn": 126092,
"afrAn": 33816,
"amiAn": 842,
"amrAn": 12594,
"asjAn": 3180,
"easAn": 4054,
"midAn": 64,
"finAn": 6476,
"nfeAn": 59088,
"othAn": 1572,
"sasAn": 4406,
"femaleAn": 65800,
"maleAn": 60292,
"allHc": 3167,
"afrHc": 413,
"amiHc": 54,
"amrHc": 238,
"asjHc": 49,
"easHc": 97,
"midHc": 2,
"finHc": 368,
"nfeHc": 1807,
"othHc": 23,
"sasHc": 116,
"femaleHc": 1407,
"maleHc": 1760
}
]
+ + \ No newline at end of file diff --git a/data-sources/mito-heteroplasmy/index.html b/data-sources/mito-heteroplasmy/index.html index d1d90f43..dcb68c64 100644 --- a/data-sources/mito-heteroplasmy/index.html +++ b/data-sources/mito-heteroplasmy/index.html @@ -6,13 +6,13 @@ Mitochondrial Heteroplasmy | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
- - +
Version: 3.25 (unreleased)

Mitochondrial Heteroplasmy

Overview

Mitochondrial Heteroplasmy is an aggregate population data set that characterizes the amount of heteroplasmy observed for each variant. The latest version of this data set is based on re-processed 1000 Genomes Project data using the Illumina DRAGEN pipeline.

JSON File

Example

{
"T:C":{
"ad":[
1,
1,
1,
1,
1,
1
],
"allele_type":"alt",
"vrf":[
0.002369668246445498,
0.0024937655860349127,
0.0016129032258064516,
0.0025188916876574307,
0.0022935779816513763,
0.002008032128514056
],
"vrf_stats":{
"kurtosis":38.889891511122556,
"max":0.0025188916876574307,
"mean":5.4052190471990743e-05,
"min":0.0,
"nobs":246,
"skewness":6.346664692283075,
"stdev":0.0003461416264750575,
"variance":1.1981402557879823e-07
}
}
}

Parsing

From the JSON file, we're mainly interested in the following keys:

  • variant (i.e. T:C)
  • ad
  • vrf
  • nobs (number of observations)
Adjusting for null observations

The nobs value indicates how many observations were made. Ideally this would have been represented in the ad and vrf arrays, but it's left as an exercise for the reader.

Binning VRF Data

The vrf (variant read frequency) array in the JSON object above is paired with with the ad array (allele depths) shown above.

The data in the JSON object has a crazy number of significant digits. This means that as the number of samples increase, this array will grow. To make this more future-proof, Illumina Connected Annotations bins everything according to 0.1% increments.

With the binned data, we end up having 775 distinct vrf values in the entire JSON file. This also means that the variant with the largest number of VRFs would originally have 246 entries, but due to binning this will decrease to 143.

Pre-processing the Data

The JSON file is converted into a small TSV file that is embedded in Illumina Connected Annotations. Here is an example of the TSV file:

#CHROM  POS REF ALT VRF_BINS    VRF_COUNTS
chrM 1 G . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736
chrM 2 A . 0.981,0.987,0.988,0.989,0.99,0.991,0.992,0.993,0.994,0.995,0.996,0.997,0.998,0.999 1,2,2,4,7,8,11,19,43,60,48,64,499,1736

Algorithm

Illumina Connected Annotations will calculate mitochondrial heteroplasmy data for every sample in the VCF. Using the computed VRF for each sample, we compute where in the empirical mitochondrial heteroplasmy distribution that VRF occurs and express that as a percentile.

Percentiles

Illumina Connected Annotations uses the statistical definition of percentile (indicating the value below which a given percentage of observations in a group of observations falls). Unless the sample's VRF is higher than all the VRFs represented in the distribution, the range will be [0, 1).

Download URL

Unavailable

The original data set is only available internally at Illumina at the moment.

JSON Output

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
],
"alleleDepths":[
10,
20,
30
],
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeNotes
heteroplasmyPercentilefloat arrayone percentile for each variant frequency (each alternate allele)
+ + \ No newline at end of file diff --git a/data-sources/mitomap-small-variants-json/index.html b/data-sources/mitomap-small-variants-json/index.html index a3fcb661..9ed62ec2 100644 --- a/data-sources/mitomap-small-variants-json/index.html +++ b/data-sources/mitomap-small-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-small-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
- - +
Version: 3.25 (unreleased)

mitomap-small-variants-json

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele
+ + \ No newline at end of file diff --git a/data-sources/mitomap-structural-variants-json/index.html b/data-sources/mitomap-structural-variants-json/index.html index a6640218..d7021189 100644 --- a/data-sources/mitomap-structural-variants-json/index.html +++ b/data-sources/mitomap-structural-variants-json/index.html @@ -6,13 +6,13 @@ mitomap-structural-variants-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.25 (unreleased)

mitomap-structural-variants-json

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/data-sources/mitomap/index.html b/data-sources/mitomap/index.html index 04b05461..b54c42e5 100644 --- a/data-sources/mitomap/index.html +++ b/data-sources/mitomap/index.html @@ -6,13 +6,13 @@ MITOMAP | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
- - +
Version: 3.25 (unreleased)

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Illumina Connected Annotations is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
+ + \ No newline at end of file diff --git a/data-sources/omim-json/index.html b/data-sources/omim-json/index.html index 78b675c9..d8aa331d 100644 --- a/data-sources/omim-json/index.html +++ b/data-sources/omim-json/index.html @@ -6,13 +6,13 @@ omim-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
- - +
Version: 3.25 (unreleased)

omim-json

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping
+ + \ No newline at end of file diff --git a/data-sources/omim/index.html b/data-sources/omim/index.html index 06bc9b86..81ead9da 100644 --- a/data-sources/omim/index.html +++ b/data-sources/omim/index.html @@ -6,18 +6,18 @@ OMIM | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
+

Version: 3.25 (unreleased)

OMIM

Overview

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.

Publications

Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.

Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parse OMIM data

Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.

mim2gene.txt

This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:

# MIM Number    MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq)   Entrez Gene ID (NCBI)   Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437

The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.

OMIM API

Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.

{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,

}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}

Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output

Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:

Illumina Connected Annotations JSON key chainOMIM API JSON key chain
omim:mimNumberomim:entryList:entry:mimNumber
omim:geneNameomim:entryList:entry:geneMap:geneName
omim:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mimNumberomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber
omim:phenotypes:phenotypeomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype
omim:phenotypes:descriptionomim:entryList:entry:textSectionList:textSection:textSectionContent
omim:phenotypes:mappingomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below)
omim:phenotypes:inheritancesomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance
omim:phenotypes:commentsomim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below)

Mapping key to content

1 to disorder was positioned by mapping of the wild type gene
2 to disease phenotype itself was mapped
3 to molecular basis of the disorder is known
4 to disorder is a chromosome deletion or duplication syndrome

Phenotype character to comment

? to unconfirmed or possibly spurious mapping
[/] to nondiseases
{/} to contribute to susceptibility to multifactorial disorders or to susceptibility to infection

There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:

The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).

As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:

  • Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.
  • Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".
  • All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".
  • If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".

Here is a list of examples about how the description section supposed to be processed:

Original textProcessed text
({516030}, {516040}, and {516050})
(e.g., D1, {168461}; D2, {123833}; D3, {123834})(e.g., D1; D2; D3)
(desmocollins; see DSC2, {125645})(desmocollins; see DSC2)
(e.g., see {102700}, {300755})
(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650})(ADH). See also liver mitochondrial ALDH2
(see, e.g., CACNA1A; {601011})(see, e.g., CACNA1A)
(e.g., GSTA1; {138359}), mu (e.g., {138350})(e.g., GSTA1), mu
(NFKB; see {164011})(NFKB)
(see ISGF3G, {147574})(see ISGF3G)
(DCK; {EC 2.7.1.74}; {125450})(DCK; EC 2.7.1.74)

JSON output

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

Building the supplementary files

There are 2 ways of building your own OMIM supplementary files using SAUtils.

The first way is to use SAUtils command's subcommands downloadOMIM and omim.

The second way is to use SAUtils command's subcommands AutoDownloadGenerate. To use AutoDownloadGenerate, read more in SAUtils section.

Using subcommands downloadOMIM and omim

The first step in builing the OMIM .nga files is to use the SAUtils command's subcommand downloadOMIM to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable OmimApiKey.

export OmimApiKey=<users-omim-api-key>
SAUtils.dll downloadOMIM
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll downloadomim [options]
Download the OMIM gene annotation data

OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--in, -i <path> input configuration JSON path (optional)
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14

---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

Gene Symbol Update Statistics
============================================
{
"NumGeneSymbolsUpToDate": 16978,
"NumGeneSymbolsUpdated": 60,
"NumGenesWhereBothIdsAreNull": 0,
"NumGeneSymbolsNotInCache": 105,
"NumUnresolvedGeneSymbolConflicts": 0
}

Once the download has succeeded, the nga files can be produced using the SAUtils command's subcommand omim.

dotnet SAUtils.dll omim
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll omim [options]
Creates a gene annotation database from OMIM data

OPTIONS:
--m2g, -m <VALUE> MimToGeneSymbol tsv file
--json, -j <VALUE> OMIM entry json file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version


dotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------


Time: 00:00:04.5
- - + + \ No newline at end of file diff --git a/data-sources/phylop-json/index.html b/data-sources/phylop-json/index.html index 1b33362b..5349d0bc 100644 --- a/data-sources/phylop-json/index.html +++ b/data-sources/phylop-json/index.html @@ -6,13 +6,13 @@ phylop-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - +
Version: 3.25 (unreleased)

phylop-json

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
+ + \ No newline at end of file diff --git a/data-sources/phylop/index.html b/data-sources/phylop/index.html index 181f000a..b8ab60c1 100644 --- a/data-sources/phylop/index.html +++ b/data-sources/phylop/index.html @@ -6,16 +6,16 @@ PhyloP | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

PhyloP

Overview

Publication

Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. Nature 2023. (https://doi.org/10.1038/s41586-023-06798-8)

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

PhyloP Primate

PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates. +

Version: 3.25 (unreleased)

PhyloP

Overview

Publication

Kuderna, L.F.K., Ulirsch, J.C., Rashid, S. et al. Identification of constrained sequence elements across 239 primate genomes. Nature 2023. (https://doi.org/10.1038/s41586-023-06798-8)

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. (http://www.genome.org/cgi/doi/10.1101/gr.3715005)

PhyloP Primate

PhyloP primate analyzes 239 primate species and identifies 111,318 hypersensitivity sites and 267,410 binding sites constrained specifically in primates. It enriches that with human genetic variants, these elements influence gene expression and impact complex traits and diseases.

PhyloP Primate is only available for GRCh38 assembly.

BigWig File

The original file is primates_msa.phylop.conacc.lrt.bw which is a bigwig file. This file was converted to wig file using: (https://genome.ucsc.edu/goldenPath/help/bigWig.html) After conversion the wig file provides the scores in the following format:

0.14
0.074
-2.487
0.073
0.052
0.073
fixedStep chrom=chr1 start=10558 step=1 span=1
-1.991
0.052
-2.047
0.052
0.052
0.074
-1.992
0.074
0.052
0.073
0.074
0.052
0.074
-2.05
-2.059
0.074
0.074
0.074

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951

PhyloP

PhyloP (phylogenetic p-values) conservation scores are obtained from the [PHAST package] (http://compgen.bscb.cornell.edu/phast/) for multiple alignments of vertebrate genomes to the human genome. For GRCh38, the multiple alignments are against 19 mammals and for GRCh37, it is against 45 vertebrate genomes.

WigFix File

The data is provided in WigFix files which is a text file that provides conservation scores for contiguous intervals in the following format:

fixedStep chrom=chr1 start=10918 step=1
0.064
0.058
0.064
0.058
0.064
0.064
fixedStep chrom=chr1 start=34045 step=1
0.111
0.100
0.111
0.111
0.100
0.111
0.111
0.111
0.100
0.111
-1.636

We convert them to binary files with indexes for fast query. Note that these are scores for genomic positions and are reported only for SNVs.

Download URL

GRCh37: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phyloP46way/vertebrate/

GRCh38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/

JSON Output

Unlike other supplemetary datasources, phyloP scores are reported in the variants section.

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"phylopScore":0.459
}
]
FieldTypeNotes
phylopScorefloatrange: -14.08 to 6.424
- - + + \ No newline at end of file diff --git a/data-sources/phylopprimate-json/index.html b/data-sources/phylopprimate-json/index.html index f92cd9c2..0485ac12 100644 --- a/data-sources/phylopprimate-json/index.html +++ b/data-sources/phylopprimate-json/index.html @@ -6,13 +6,13 @@ phylopprimate-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

phylopprimate-json

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951
- - +
Version: 3.25 (unreleased)

phylopprimate-json

 "variants": [
{
"vid": "1-64927-G-T",
"chromosome": "chr1",
"begin": 64927,
"end": 64927,
"refAllele": "G",
"altAllele": "T",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.64927G>T",
"phyloPPrimateScore": 0.151
}
]
FieldTypeNotes
phyloPPrimateScorefloatrange: -20 to 1.951
+ + \ No newline at end of file diff --git a/data-sources/primate-ai-json/index.html b/data-sources/primate-ai-json/index.html index 90aad4c1..3ba540a1 100644 --- a/data-sources/primate-ai-json/index.html +++ b/data-sources/primate-ai-json/index.html @@ -6,13 +6,13 @@ primate-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

primate-ai-json

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
- - +
Version: 3.25 (unreleased)

primate-ai-json

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
+ + \ No newline at end of file diff --git a/data-sources/primate-ai/index.html b/data-sources/primate-ai/index.html index f9de51a6..e8bfa264 100644 --- a/data-sources/primate-ai/index.html +++ b/data-sources/primate-ai/index.html @@ -6,19 +6,19 @@ Primate AI-3D | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Primate AI-3D

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. +

Version: 3.25 (unreleased)

Primate AI-3D

Overview

Primate AI is a deep residual neural network for classifying the pathogenicity of missense mutations.

The newer version, PrimateAI-3D, uses a 3D convolutional neural network, to predict protein variant pathogenicity using structural information. The model's innovative use of primate sequencing and structural data offers promising insights into variant interpretation and disease gene identification. The predictive score range between 0 and 1, with 0 being benign and 1 being most pathogenic.

For more details, refer to these publications:

Publication
  1. Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023). https://doi.org/10.1126/science.abn8197
  2. Sundaram, L., Gao, H., Padigepati, S.R. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170 (2018). https://doi.org/10.1038/s41588-018-0167-z
Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

Parsing

TSV File

chr pos non_flipped_ref non_flipped_alt gene_name   change_position_1based  ref_aa  alt_aa  score_PAI3D percentile_PAI3D    refseq  prediction
chr1 69094 G A ENST00000335137.4 2 V M 0.6169436463713646 0.5200308441794135 NM_001005484.1 pathogenic
chr1 69094 G C ENST00000335137.4 2 V L 0.5557043975591658 0.4271457250214688 NM_001005484.1 benign
chr1 69094 G T ENST00000335137.4 2 V L 0.5557043975591658 0.4271457391722522 NM_001005484.1 benign
chr1 69095 T A ENST00000335137.4 2 V E 0.8063537482917307 0.8032228720356267 NM_001005484.1 pathogenic
chr1 69095 T C ENST00000335137.4 2 V A 0.5795628190040587 0.4631329075815453 NM_001005484.1 benign
chr1 69095 T G ENST00000335137.4 2 V G 0.7922330142557621 0.7834049546930125 NM_001005484.1 pathogenic

From the CSV file, all columns are parsed:

  • chr
  • pos
  • non_flipped_ref
  • non_flipped_alt
  • gene_name
  • change_position_1based
  • ref_aa
  • alt_aa
  • score_PAI3D
  • percentile_PAI3D
  • refseq
  • prediction

The fields gene_name and refseq define the Ensembl and RefSeq transcript IDs respectively. These transcripts are passed as-is and some of them might be unrecognized/deprecated by RefSeq/Ensembl.

GRCh37

Note that for GRCh37, a lifted over file is provided. The file is not sorted, therefore it must first be sorted. Also note that certain RefSeq transcripts appear not to have been mapped during the lift-over process.

Pre-processing

Sorting

gzcat PrimateAI-3D.hg19.txt.gz | sort -t $'\t'  -k1,1 -k2,2n | gzip > PrimateAI-3D.hg19_sorted.tsv.gz

SA Generation

dotnet SAUtils.dll \
PrimateAi \
--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \
--i "${ExternalDataSources}/PrimateAI/3D/PrimateAI-3D.hg38.txt.gz" \
--o "${SaUtilsOutput]"

Known Issues

Known Issues

Some transcript IDs defined in the data file are obsolete, retired, or updated. They are not removed or modified by Illumina Connected Annotations, and are passed as-is from the PrimateAI-3D data source.

Example:

ENST00000643905.1 transcript is retired according to Ensembl

NM_182838.2 transcript is removed because it is a pseudo-gene according to RefSeq

Download URL

https://primad.basespace.illumina.com/

JSON Output

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification
- - + + \ No newline at end of file diff --git a/data-sources/revel-json/index.html b/data-sources/revel-json/index.html index f457ba5a..18a7eec9 100644 --- a/data-sources/revel-json/index.html +++ b/data-sources/revel-json/index.html @@ -6,13 +6,13 @@ revel-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.25 (unreleased)

revel-json

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/data-sources/revel/index.html b/data-sources/revel/index.html index 8228ca04..3158f658 100644 --- a/data-sources/revel/index.html +++ b/data-sources/revel/index.html @@ -6,13 +6,13 @@ REVEL | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
- - +
Version: 3.25 (unreleased)

REVEL

Overview

REVEL is an ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons.

Publication

Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. The American Journal of Human Genetics 99, 877-885 (2016). https://doi.org/10.1016/j.ajhg.2016.08.016

CSV File

Example

chr,hg19_pos,grch38_pos,ref,alt,aaref,aaalt,REVEL
1,35142,35142,G,A,T,M,0.027
1,35142,35142,G,C,T,R,0.035
1,35142,35142,G,T,T,K,0.043
1,35143,35143,T,A,T,S,0.018
1,35143,35143,T,C,T,A,0.034

Parsing

From the CSV file, we're mainly interested in the following columns:

  • chr
  • hg19_pos
  • grch38_pos
  • ref
  • alt
  • REVEL

Known Issues

Sorting

Since the input file contains positions for both GRCh37 and GRCh38, we split it into two TSV files (for the sake of better readability) with identical format. The positions for GRCh37 were sorted but not for GRCh38. So we re-sort the variants by position in the GRCh38 file.

Conflicting Scores

When there are multiple scores available for the same variant (i.e. the same position with the same alternative allele), we pick the highest score.

Download URL

https://sites.google.com/site/revelgenomics/downloads

JSON Output

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0
+ + \ No newline at end of file diff --git a/data-sources/splice-ai-json/index.html b/data-sources/splice-ai-json/index.html index 4d612562..ba9f0581 100644 --- a/data-sources/splice-ai-json/index.html +++ b/data-sources/splice-ai-json/index.html @@ -6,13 +6,13 @@ splice-ai-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.25 (unreleased)

splice-ai-json

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/data-sources/splice-ai/index.html b/data-sources/splice-ai/index.html index f104202d..b63a7117 100644 --- a/data-sources/splice-ai/index.html +++ b/data-sources/splice-ai/index.html @@ -6,13 +6,13 @@ Splice AI | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
- - +
Version: 3.25 (unreleased)

Splice AI

Overview

SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence.

Publication

K. Jaganathan, et al. Predicting splicing from primary sequence with deep learning. Cell, 176 (3) (2019), pp. 535-548 e24

Professional data source

This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.

VCF File

Example

##fileformat=VCFv4.0
##assembly=GRCh37/hg19
##INFO=<ID=SYMBOL,Number=1,Type=String,Description="HGNC gene symbol">
##INFO=<ID=STRAND,Number=1,Type=String,Description="+ or - depending on whether the gene lies in the positive or negative strand">
##INFO=<ID=TYPE,Number=1,Type=String,Description="E or I depending on whether the variant position is exonic or intronic (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DIST,Number=1,Type=Integer,Description="Distance between the variant position and the closest splice site (GENCODE V24lift37 canonical annotation)">
##INFO=<ID=DS_AG,Number=1,Type=Float,Description="Delta score (acceptor gain)">
##INFO=<ID=DS_AL,Number=1,Type=Float,Description="Delta score (acceptor loss)">
##INFO=<ID=DS_DG,Number=1,Type=Float,Description="Delta score (donor gain)">
##INFO=<ID=DS_DL,Number=1,Type=Float,Description="Delta score (donor loss)">
##INFO=<ID=DP_AG,Number=1,Type=Integer,Description="Delta position (acceptor gain) relative to the variant position">
##INFO=<ID=DP_AL,Number=1,Type=Integer,Description="Delta position (acceptor loss) relative to the variant position">
##INFO=<ID=DP_DG,Number=1,Type=Integer,Description="Delta position (donor gain) relative to the variant position">
##INFO=<ID=DP_DL,Number=1,Type=Integer,Description="Delta position (donor loss) relative to the variant position">
#CHROM POS ID REF ALT QUAL FILTER INFO
10 92946 . C T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0000;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-26;DP_AL=-10;DP_DG=3;DP_DL=35
10 92946 . C G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0008;DS_AL=0.0000;DS_DG=0.0003;DS_DL=0.0000;DP_AG=34;DP_AL=-27;DP_DG=35;DP_DL=1
10 92946 . C A . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-53;DS_AG=0.0004;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=-10;DP_AL=-48;DP_DG=35;DP_DL=-21
10 92947 . A C . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-49;DP_AL=-11;DP_DG=0;DP_DL=34
10 92947 . A T . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0002;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=-22;DP_DL=34
10 92947 . A G . . SYMBOL=TUBB8;STRAND=-;TYPE=E;DIST=-54;DS_AG=0.0006;DS_AL=0.0000;DS_DG=0.0001;DS_DL=0.0000;DP_AG=33;DP_AL=-11;DP_DG=34;DP_DL=32

Parsing

From the VCF file, we're mainly interested in the following columns:

  • DS_AG - Δ score (acceptor gain)
  • DS_AL - Δ score (acceptor loss)
  • DS_DG - Δ score (donor gain)
  • DS_DL - Δ score (donor loss)
  • DP_AG - Δ position (acceptor gain) relative to the variant position
  • DP_AL - Δ position (acceptor loss) relative to the variant position
  • DP_DG - Δ position (donor gain) relative to the variant position
  • DP_DL - Δ position (donor loss) relative to the variant position

The Splice AI team suggests the following interpretation for the scores:

RangeConfidencePathogenicity
0 ≤ x < 0.1lowlikely benign
0.1 ≤ x ≤ 0.5mediumlikely pathogenic
x > 0.5highpathogenic

Pre-processing

Filtering

Splice AI provides a comprehensive list of entries throughout the genome. However, many of the entries have little value. I.e. observing low splice scores in intergenic regions. Not only do these extra entries require more storage, but the unused content has a negative impact on annotation speed.

As a result, Illumina Connected Annotations filters out all the values in the low confidence tier except for regions within 15 bp of nascent splice sites. For those regions, we found it useful to see if Splice AI predicted an interruption of the splicing mechanism.

Download URL

https://basespace.illumina.com/s/5u6ThOblecrh

JSON Output

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place
+ + \ No newline at end of file diff --git a/data-sources/topmed-json/index.html b/data-sources/topmed-json/index.html index 97b9179b..8f8fb94e 100644 --- a/data-sources/topmed-json/index.html +++ b/data-sources/topmed-json/index.html @@ -6,13 +6,13 @@ topmed-json | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.25 (unreleased)

topmed-json

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/data-sources/topmed/index.html b/data-sources/topmed/index.html index f2ac7264..b10151f2 100644 --- a/data-sources/topmed/index.html +++ b/data-sources/topmed/index.html @@ -6,13 +6,13 @@ TOPMed | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
- - +
Version: 3.25 (unreleased)

TOPMed

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Publication

Kowalski, M.H., Qian, H., Hou, Z., Rosen, J.D., Tapia, A.L., Shan, Y., Jain, D., Argos, M., Arnett, D.K., Avery, C. and Barnes, K.C., 2019. Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS genetics, 15(12), p.e1008500.

VCF extraction

We currently extract the following fields from TOPMed VCF file:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Samples with Coverage">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts in Samples with Coverage">
##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequencies">
##INFO=<ID=Het,Number=A,Type=Integer,Description="Number of samples with heterozygous genotype calls">
##INFO=<ID=Hom,Number=A,Type=Integer,Description="Number of samples with homozygous alternate genotype calls">

Example:

chr1    10132   TOPMed_freeze_5?chr1:10,132     T       C       255     SVM     VRT=1;NS=62784;AN=125568;AC=32;AF=0.000254842;Het=32;Hom=0      NA:FRQ  125568:0.000254842

GRCh37 liftover

The data is not available for GRCh37 on TOPMed website. We performed a liftover from GRCh38 to GRCh37 using dbSNP ids.

Download URL

https://bravo.sph.umich.edu/freeze5/hg38/download

JSON output

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters
+ + \ No newline at end of file diff --git a/file-formats/custom-annotations/index.html b/file-formats/custom-annotations/index.html index 3ffae41c..0f339a28 100644 --- a/file-formats/custom-annotations/index.html +++ b/file-formats/custom-annotations/index.html @@ -6,12 +6,12 @@ Custom Annotations | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another +

Version: 3.25 (unreleased)

Custom Annotations

Overview

While the team tries to keep data sources up-to-date, you might want to start incorporate new annotations ahead of our update cycle. Another common use case involves protected health information (PHI). Custom annotations are a mechanism that enables both use cases.

Here are some examples of how our collaborators use custom annotations:

  • associating context from both a sample-level and a sample cohort level with the variant annotations
  • adding content that is licensed (e.g. HGMD) to the variant annotations

At the moment, we have two different custom annotation file formats. One provides additional annotations to variants (both small variants and SVs) while the other caters to gene annotations.

In both cases, the custom annotation file format is a tab-delimited file that is separated into two parts: the header & the data.

The header is where you can customize how you want the data to appear in the JSON file and provide context about the genome assembly and how Illumina Connected Annotations should match the variants.

At Illumina, there are usually many components downstream of Illumina Connected Annotations that have to parse our annotations. If a customer provides a custom @@ -34,7 +34,7 @@ chromosome, svLength, cytogeneticBand, etc. The title should also not conflict with other data source keys like clingen or dgv.

caution

Care should be taken not to annotate using multiple custom annotations that all use the same title.

Genome Assemblies

The following genome assemblies can be specified:

  • GRCh37
  • GRCh38

Matching Criteria

The matching criteria instructs how Illumina Connected Annotations should match a VCF variant to the custom annotation.

The following matching criteria can be specified:

  • allele - use this when you only want allele-specific matches. This is commonly the case when using allele frequency data sources like gnomAD
  • position - use this when you want positional matches. This is commonly used with disease phenotype data sources like ClinVar
  • sv - use this when you want to match to all other overlapping SVs. This use case arose when we were adding custom annotations for baseline copy number intervals along the genome.

Categories

Categories are not used by Illumina Connected Annotations, but are often used by downstream tools. Categories provide hints for how those tools should filter or display the annotation data.

When a category is specified, Illumina Connected Annotations will provide additional validation for those fields. The following table describes each category:

CategoryDescriptionValidation
AlleleCountallele counts for a specific populationSee the supported populations below
AlleleNumberallele numbers for a specific populationSee the supported populations below
AlleleFrequencyallele frequencies for a specific populationSee the supported populations below
PredictionACMG-style pathogenicity classificationsbenign (B)
likely benign (LB)
VUS
likely pathogenic (LP)
pathogenic (P)
Filterfree text that signals downstream tools to add the column to the filterMax 20 characters
Descriptionfree-text descriptionMax 100 characters
Identifierany IDMax 50 characters
HomozygousCountcount of homozygous individuals for a specific populationSee the supported populations below
Scoreany score valueAny double-precision floating point number

Descriptions

Descriptions are used to add more context to the categories. For now, descriptions are mainly used to associate allele counts, numbers, and frequencies with their respective populations.

Populations

The following populations were specified in the HapMap project, 1000 Genomes Project, ExAC, and gnomAD.

Population CodeSuper-population CodeDescription
ACBAFRAfrican Caribbeans in Barbados
AFRAFRAfrican
ALLALLAll populations
AMRAMRAd Mixed American
ASJAshkenazi Jewish
ASWAFRAmericans of African Ancestry in SW USA
BEBSASBengali from Bangladesh
CDXEASChinese Dai in Xishuangbanna, China
CEUEURUtah Residents (CEPH) with Northern and Western European Ancestry
CHBEASHan Chinese in Beijing, China
CHSEASSouthern Han Chinese
CLMAMRColombians from Medellin, Colombia
EASEASEast Asian
ESNAFREsan in Nigeria
EUREUREuropean
FINEURFinnish in Finland
GBREURBritish in England and Scotland
GIHSASGujarati Indian from Houston, Texas
GWDAFRGambian in Western Divisions in the Gambia
IBSEURIberian population in Spain
ITUSASIndian Telugu from the UK
JPTEASJapanese in Tokyo, Japan
KHVEASKinh in Ho Chi Minh City, Vietnam
LWKAFRLuhya in Webuye, Kenya
MAGAFRMandinka in the Gambia
MKKAFRMaasai in Kinyawa, Kenya
MSLAFRMende in Sierra Leone
MXLAMRMexican Ancestry from Los Angeles, USA
NFEEUREuropean (Non-Finnish)
OTHOTHOther
PELAMRPeruvians from Lima, Peru
PJLSASPunjabi from Lahore, Pakistan
PURAMRPuerto Ricans from Puerto Rico
SASSASSouth Asian
STUSASSri Lankan Tamil from the UK
TSIEURToscani in Italia
YRIAFRYoruba in Ibadan, Nigeria

Data Types

Each custom annotation can be one of the following data types:

  • bool - true or false
  • number - any integer or floating-point number
  • string - text
tip

For boolean variables, only keys with a true value will be output to the JSON object.

Using SAUtils

Illumina Connected Annotations includes a tool called SAUtils that converts various data sources into Illumina Connected Annotations's native binary format. The sub-commands customvar and customgene are used to specify a variant file or a gene file respectively.

Convert Variant File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customvar \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory

Convert Gene File

dotnet bin/Release/netcoreapp2.1/SAUtils.dll customgene \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-c Data/Cache \
-i MyDataSource.tsv \
-o SupplementaryAnnotation
  • the -c argument specifies the Illumina Connected Annotations cache path
  • the -i argument specifies the input TSV path
  • the -o argument specifies the output directory
- - + + \ No newline at end of file diff --git a/file-formats/illumina-annotator-json-file-format/index.html b/file-formats/illumina-annotator-json-file-format/index.html index 758d3ffc..a0ef3219 100644 --- a/file-formats/illumina-annotator-json-file-format/index.html +++ b/file-formats/illumina-annotator-json-file-format/index.html @@ -6,13 +6,13 @@ Illumina Connected Annotations JSON File Format | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"id": "4"
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
idstringallprovided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2-48010488-G-A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"NMD_transcript_variant",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268/4158",
"cdsPos":"116/483",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39/160",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"impact": "moderate",
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"proteinId":"ENSP00000405294.1",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstringFormat: start-end/Length
cdsPosstringFormat: start-end/Length
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstringFormat: start-end/Length
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
impactstringSee Consequence Impact for details
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
proteinIdstringprotein ID. E.g. ENSP00000405294.1
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
- - +
Version: 3.25 (unreleased)

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"id": "4"
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
idstringallprovided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2-48010488-G-A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"NMD_transcript_variant",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268/4158",
"cdsPos":"116/483",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39/160",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"impact": "moderate",
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"proteinId":"ENSP00000405294.1",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstringFormat: start-end/Length
cdsPosstringFormat: start-end/Length
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstringFormat: start-end/Length
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
impactstringSee Consequence Impact for details
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
proteinIdstringprotein ID. E.g. ENSP00000405294.1
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"classification": "pathogenic",
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0
classificationstringpathogenic or benign classification

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"ensemblGeneId": "ENSG00000142611",
"ncbiGeneId": "63976",
"hgncId": 14000,
"cosmic": {
"tier": 1,
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner
tiernumberCosmic tiers [1, 2]
+ + \ No newline at end of file diff --git a/file-formats/illumina-annotator-vcf-file-format/index.html b/file-formats/illumina-annotator-vcf-file-format/index.html index 3fb38b3a..d3f9cdf0 100644 --- a/file-formats/illumina-annotator-vcf-file-format/index.html +++ b/file-formats/illumina-annotator-vcf-file-format/index.html @@ -6,13 +6,13 @@ Illumina Connected Annotations VCF File Format | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Illumina Connected Annotations VCF File Format

Overview

While JSON output format is the default option, we support VCF file as our output too. The VCF output mode can be enabled by --output-mode vcf as shown below:

dotnet Annotator.dll \
-c Data/Cache \
--output-format vcf \
-r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000.out
# HiSeq.10000.out.vcf.gz file should be produced after processing.

VCF Output Format

The output VCF file should have headers similar as below, which indicates the IlluminaConnectedAnnotations's version, file creation time, assembly, and data sources used for producing the output:

##fileformat=VCFv4.2
##IlluminaConnectedAnnotations="3.24.0" time="2024-03-22 07:02:13" assembly="GRCh38" Ensembl="110" RefSeq="GCF_000001405.40-RS_2023_03"
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230110
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
...
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Novaseq_TSPF450-NA12878-1-HFHWJDMXX_S1_L001 Novaseq_TSPF450-NA12891-1-HFHWJDMXX_S3_L001

VCF Lines

Core annotation for overlapping transcripts is enabled and no supplementary annotation is added in VCF mode. A CSQ field is added under INFO column with following format:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">

Multiple transcripts are separated with ,. An example of produced VCF lines as below:

chr21 5316038  MantaDEL:1:11095:74644:0:4:0  G  <DEL> 999   MaxDepth END=7246574;SVTYPE=DEL;SVLEN=-1930536;SVINSLEN=4;SVINSSEQ=TTCT;CSQ=<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624261.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624859.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000623227.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000619252.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623449.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623436.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624627.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624368.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623914.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624516.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624412.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000622939.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623050.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624444.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623887.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000611026.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000610788.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279784|Transcript|ENST00000623587.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279064|Transcript|ENST00000623723.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000288187|Transcript|ENST00000671789.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000616522.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000621924.4|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000619488.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000617746.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000624446.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623405.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623575.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623506.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280019|Transcript|ENST00000624484.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279709|Transcript|ENST00000623377.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688828.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688458.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692898.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689306.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692318.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624576.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623738.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701070.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623989.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701260.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692046.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692237.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689354.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624165.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624847.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000615262.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623047.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623950.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000623225.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623324.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000624181.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279788|Transcript|ENST00000624266.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279728|Transcript|ENST00000623809.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280164|Transcript|ENST00000623892.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279998|Transcript|ENST00000623678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|hsa-mir-8069-1|Transcript|ENST00000616627.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279751|Transcript|ENST00000623720.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623165.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624519.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623347.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624728.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279477|Transcript|ENST00000623518.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278884|Transcript|ENST00000625184.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623095.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000622911.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000621909.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623394.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000624310.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000615804.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000617336.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CTBP2P10|Transcript|ENST00000624153.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|NR_170984.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724354|Transcript|NR_136540.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CH507-42P11.6|Transcript|NR_171776.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724428|Transcript|NM_001320643.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354012.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354009.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NR_148682.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354010.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354015.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354014.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001321073.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354008.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354007.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354006.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320646.2|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320650.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320648.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320651.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001314050.5|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC106780825|Transcript|NR_133678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001320719.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146656.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146655.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146657.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|MIR8069-1|Transcript|NR_107036.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724843|Transcript|NR_170986.1|True||||21-5316038-7246574-G-<DEL>-DEL   GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:999:999,0,999:58,5:69,63:.:.  0/1:PASS:999:999,0,999:59,7:67,71:.:.  0/1:PASS:999:999,0,999:118,4:140,79:.:.
chr21 6639699 MantaDEL:514264:0:0:0:7:0 AGAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG AAA 537 MaxMQ0Frac END=6639804;SVTYPE=DEL;SVLEN=-105;CIGAR=1M2I105D;CSQ=AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623047.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623106.3:n.223-5036_223-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True|NC_000021.9:g.6639700_6639804delinsAA|ENST00000625185.3:n.232-5036_232-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624846.3:n.130-5036_130-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623313.1:n.312-7367_312-7263delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623950.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624965.1:n.151-5036_151-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:8:205,0,4:1,0:11,5:.:. 0/1:PASS:86:431,0,83:0,0:16,13:.:. 0/0:HomRef:61:0,11,66:2,0:7,0:.:.
chr21 8811598 MantaBND:514412:0:1:0:0:0:0 G G[chr21:8854301[ 999 NoPairSupport SVTYPE=BND;MATEID=MantaBND:514412:0:1:0:0:0:1;CIPOS=0,4;HOMLEN=4;HOMSEQ=TGCA;BND_DEPTH=300;MATE_BND_DEPTH=213;CSQ=G[chr21:8854301[|transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True||||21-8811598-G-G[chr21:8854301[ GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:253:303,0,999:9,0:89,12:.:. 0/1:PASS:999:999,0,999:0,0:99,39:.:. 0/0:HomRef:410:0,360,999:17,0:141,0:.:.
chr21 8813774 MantaINS:514450:0:0:0:1:0 T TATATATACATATATATATATACATATATATATATGTATATATATATATATAC 487 MaxMQ0Frac END=8813774;SVTYPE=INS;SVLEN=52;CIGAR=1M52I;CIPOS=0,7;HOMLEN=7;HOMSEQ=ATATATA;CSQ=ATATATACATATATATATATACATATATATATATGTATATATATATATATAC|intron_variant&non_coding_transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True|NC_000021.9:g.8813781_8813782insCATATATATATATACATATATATATATGTATATATATATATATACATATATA|ENST00000651312.1:n.40-6603_40-6602insGTATATATATATATATACATATATATATATGTATATATATATATGTATATAT||21-8813774-T-TATATATACATATATATATATACATATATATATATGTATATATATATATATAC GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:29:128,0,26:0,0:8,4:.:. 1/1:PASS:6:335,8,0:0,0:6,8:.:. 0/1:PASS:21:176,0,18:0,0:3,6:.:.
- - +
Version: 3.25 (unreleased)

Illumina Connected Annotations VCF File Format

Overview

While JSON output format is the default option, we support VCF file as our output too. The VCF output mode can be enabled by --output-mode vcf as shown below:

dotnet Annotator.dll \
-c Data/Cache \
--output-format vcf \
-r Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000.out
# HiSeq.10000.out.vcf.gz file should be produced after processing.

VCF Output Format

The output VCF file should have headers similar as below, which indicates the IlluminaConnectedAnnotations's version, file creation time, assembly, and data sources used for producing the output:

##fileformat=VCFv4.2
##IlluminaConnectedAnnotations="3.24.0" time="2024-03-22 07:02:13" assembly="GRCh38" Ensembl="110" RefSeq="GCF_000001405.40-RS_2023_03"
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230110
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
...
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Novaseq_TSPF450-NA12878-1-HFHWJDMXX_S1_L001 Novaseq_TSPF450-NA12891-1-HFHWJDMXX_S3_L001

VCF Lines

Core annotation for overlapping transcripts is enabled and no supplementary annotation is added in VCF mode. A CSQ field is added under INFO column with following format:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Illumina Connected Annotations. Format: Allele|Consequence|SYMBOL|Feature_type|Feature|CANONICAL|HGVSg|HGVSc|HGVSp|vid">

Multiple transcripts are separated with ,. An example of produced VCF lines as below:

chr21 5316038  MantaDEL:1:11095:74644:0:4:0  G  <DEL> 999   MaxDepth END=7246574;SVTYPE=DEL;SVLEN=-1930536;SVINSLEN=4;SVINSSEQ=TTCT;CSQ=<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624261.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000624859.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC01670|Transcript|ENST00000623227.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000619252.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623449.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623436.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624627.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624368.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623914.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624516.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624412.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000622939.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623050.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000624444.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000623887.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|ENST00000611026.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000610788.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279784|Transcript|ENST00000623587.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279064|Transcript|ENST00000623723.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000288187|Transcript|ENST00000671789.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000616522.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000621924.4|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000619488.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000617746.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000624446.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623405.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623575.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000275496|Transcript|ENST00000623506.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280019|Transcript|ENST00000624484.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279709|Transcript|ENST00000623377.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688828.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000688458.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692898.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689306.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692318.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624576.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623738.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701070.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000623989.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000701260.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692046.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000692237.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000689354.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624165.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278903|Transcript|ENST00000624847.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000615262.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623047.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623950.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000623225.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280145|Transcript|ENST00000623324.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278878|Transcript|ENST00000624181.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279788|Transcript|ENST00000624266.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279728|Transcript|ENST00000623809.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280164|Transcript|ENST00000623892.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279998|Transcript|ENST00000623678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|hsa-mir-8069-1|Transcript|ENST00000616627.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279751|Transcript|ENST00000623720.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623165.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624519.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000623347.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000280018|Transcript|ENST00000624728.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000279477|Transcript|ENST00000623518.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000278884|Transcript|ENST00000625184.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623095.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000622911.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000621909.4|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000623394.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000624310.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|ENSG00000277067|Transcript|ENST00000615804.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|Y_RNA|Transcript|ENST00000617336.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CTBP2P10|Transcript|ENST00000624153.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LINC03104|Transcript|NR_170984.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724354|Transcript|NR_136540.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|CH507-42P11.6|Transcript|NR_171776.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724428|Transcript|NM_001320643.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354012.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354009.3|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NR_148682.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354010.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354015.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354014.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001321073.3|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354008.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354007.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724560|Transcript|NM_001354006.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320646.2|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320650.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320648.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724594|Transcript|NM_001320651.2|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001314050.5|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC106780825|Transcript|NR_133678.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724652|Transcript|NM_001320719.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146656.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146655.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC110091777|Transcript|NR_146657.1|False||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|MIR8069-1|Transcript|NR_107036.1|True||||21-5316038-7246574-G-<DEL>-DEL,<DEL>|transcript_ablation&transcript_variant|LOC102724843|Transcript|NR_170986.1|True||||21-5316038-7246574-G-<DEL>-DEL   GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:999:999,0,999:58,5:69,63:.:.  0/1:PASS:999:999,0,999:59,7:67,71:.:.  0/1:PASS:999:999,0,999:118,4:140,79:.:.
chr21 6639699 MantaDEL:514264:0:0:0:7:0 AGAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG AAA 537 MaxMQ0Frac END=6639804;SVTYPE=DEL;SVLEN=-105;CIGAR=1M2I105D;CSQ=AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623047.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623106.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623106.3:n.223-5036_223-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000625185.3|True|NC_000021.9:g.6639700_6639804delinsAA|ENST00000625185.3:n.232-5036_232-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624846.3|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624846.3:n.130-5036_130-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000623313.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000623313.1:n.312-7367_312-7263delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|upstream_gene_variant|ENSG00000280145|Transcript|ENST00000623950.1|False|NC_000021.9:g.6639700_6639804delinsAA|||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA,AA|intron_variant&non_coding_transcript_variant|ENSG00000280145|Transcript|ENST00000624965.1|False|NC_000021.9:g.6639700_6639804delinsAA|ENST00000624965.1:n.151-5036_151-4932delinsTT||21-6639700-GAAAGAAAGAAAGAGAAAAAAAGAAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAGAAAGAAGAAAGAAAGAAAG-AA GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:8:205,0,4:1,0:11,5:.:. 0/1:PASS:86:431,0,83:0,0:16,13:.:. 0/0:HomRef:61:0,11,66:2,0:7,0:.:.
chr21 8811598 MantaBND:514412:0:1:0:0:0:0 G G[chr21:8854301[ 999 NoPairSupport SVTYPE=BND;MATEID=MantaBND:514412:0:1:0:0:0:1;CIPOS=0,4;HOMLEN=4;HOMSEQ=TGCA;BND_DEPTH=300;MATE_BND_DEPTH=213;CSQ=G[chr21:8854301[|transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True||||21-8811598-G-G[chr21:8854301[ GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:253:303,0,999:9,0:89,12:.:. 0/1:PASS:999:999,0,999:0,0:99,39:.:. 0/0:HomRef:410:0,360,999:17,0:141,0:.:.
chr21 8813774 MantaINS:514450:0:0:0:1:0 T TATATATACATATATATATATACATATATATATATGTATATATATATATATAC 487 MaxMQ0Frac END=8813774;SVTYPE=INS;SVLEN=52;CIGAR=1M52I;CIPOS=0,7;HOMLEN=7;HOMSEQ=ATATATA;CSQ=ATATATACATATATATATATACATATATATATATGTATATATATATATATAC|intron_variant&non_coding_transcript_variant|ENSG00000286033|Transcript|ENST00000651312.1|True|NC_000021.9:g.8813781_8813782insCATATATATATATACATATATATATATGTATATATATATATATACATATATA|ENST00000651312.1:n.40-6603_40-6602insGTATATATATATATATACATATATATATATGTATATATATATATGTATATAT||21-8813774-T-TATATATACATATATATATATACATATATATATATGTATATATATATATATAC GT:FT:GQ:PL:PR:SR:DQ:DN 0/1:PASS:29:128,0,26:0,0:8,4:.:. 1/1:PASS:6:335,8,0:0,0:6,8:.:. 0/1:PASS:21:176,0,18:0,0:3,6:.:.
+ + \ No newline at end of file diff --git a/frequently-asked-questions/Annotator-vs-data-update/index.html b/frequently-asked-questions/Annotator-vs-data-update/index.html index bf0662bd..24f09fcc 100644 --- a/frequently-asked-questions/Annotator-vs-data-update/index.html +++ b/frequently-asked-questions/Annotator-vs-data-update/index.html @@ -6,13 +6,13 @@ Annotation Engine vs Data update | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Annotation Engine vs Data update

Background

Update to annotations can be broadly categorized into two categories:

  • Annotation engine (Annotator) update.
  • Annotation data update.

Understanding the nature of these two types of updates is key when it comes to updating annotation.

Annotator update

The annotator is the engine that contains logic for core annotations such as computing variant consequences, HGVS notations, mapped positions (e.g. CDNA, CDS, protein positions), detecting gene fusions, etc., and perform annotation lookups from external data sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. also known as supplementary annotations (SA). Update to the annotator entails new features or bugfixes to the compute or lookup mechanism. This is completely independent of the data update such as updating dbSNP from v154 to v155. In other words, the same annotator can annotate with dbSNP v154 and dbSNP v155 when provided with the appropriate data files.

Data update

The annotator uses data from various sources (listed in Introduction). For example, gene models used for core annotations are obtained from RefSeq and Ensembl. Supplementary annotations come from various sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. Any of these data can be updated without updating the annotator as long as the file formats are compatible.

Update scenarios

Let us look at a few update scenarios.

RequirementWhat needs to be updated /addedSuggested action
New transcripts and gene symbolsCache files from RefSeq and EnsemblRun Downloader
Update ClinVarClinVar SA filesRun Downloader
New external annotationNew SA files requiredSubmit feature request
New annotation featureAnnotatorSubmit feature request
- - +
Version: 3.25 (unreleased)

Annotation Engine vs Data update

Background

Update to annotations can be broadly categorized into two categories:

  • Annotation engine (Annotator) update.
  • Annotation data update.

Understanding the nature of these two types of updates is key when it comes to updating annotation.

Annotator update

The annotator is the engine that contains logic for core annotations such as computing variant consequences, HGVS notations, mapped positions (e.g. CDNA, CDS, protein positions), detecting gene fusions, etc., and perform annotation lookups from external data sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. also known as supplementary annotations (SA). Update to the annotator entails new features or bugfixes to the compute or lookup mechanism. This is completely independent of the data update such as updating dbSNP from v154 to v155. In other words, the same annotator can annotate with dbSNP v154 and dbSNP v155 when provided with the appropriate data files.

Data update

The annotator uses data from various sources (listed in Introduction). For example, gene models used for core annotations are obtained from RefSeq and Ensembl. Supplementary annotations come from various sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. Any of these data can be updated without updating the annotator as long as the file formats are compatible.

Update scenarios

Let us look at a few update scenarios.

RequirementWhat needs to be updated /addedSuggested action
New transcripts and gene symbolsCache files from RefSeq and EnsemblRun Downloader
Update ClinVarClinVar SA filesRun Downloader
New external annotationNew SA files requiredSubmit feature request
New annotation featureAnnotatorSubmit feature request
+ + \ No newline at end of file diff --git a/index.html b/index.html index 66fddc67..324a5cbd 100644 --- a/index.html +++ b/index.html @@ -5,17 +5,17 @@ -Introduction | IlluminaConnectedAnnotations - - +Introduction | IlluminaConnectedAnnotations + +
-
Version: 3.24 (unreleased)

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. +

Version: 3.25 (unreleased)

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

The transcript and gene models are obtained from RefSeq and Ensembl. The current officially supported versions are:

Data SourceVersionRelease Date
RefSeqGCF_000001405.40-RS_2023_032023-03-21
Ensembl1102023-04-27

In addition, it uses external data sources to provide additional context for each variant. Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. The basic tier can be accessed free of charge. The professional tier requires a license. Please see Licensed Content for details. For access, please contact annotation_support@illumina.com.

Data SourceAvailabilityLatest Supported Version
COSMICProfessional99
OMIMProfessional20240110
Primate AI-3DProfessional1.0
Splice AIProfessional1.3
1000 Genomes ProjectBasicPhase 3 v3plus
Cancer HotspotsBasic2017
ClinGenBasic20240110
ClinVarBasic20231230
DANNBasic20200205
dbSNPBasic156
DECIPHERBasic201509
FusionCatcherBasic1.33
GERPBasic20110522
GME VariomeBasic20160618
gnomADBasic3.1.2
MITOMAPBasic20200819
MultiZ 100 wayBasic20171006
REVELBasic20200205
TOPMedBasicfreeze 5

Download

Please visit Illumina Connected Annotations.

- - + + \ No newline at end of file diff --git a/introduction/dependencies/index.html b/introduction/dependencies/index.html index e1bb0f4b..8174079a 100644 --- a/introduction/dependencies/index.html +++ b/introduction/dependencies/index.html @@ -6,13 +6,13 @@ Dependencies | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
- - +
Version: 3.25 (unreleased)

Dependencies

All of the following dependencies have been included in this repository.

NameLicenseUsage
Amazon.LambdaApacheAWS extensions for .NET CLI
AWSSDKApacheAWS Lambda, S3, SNS support
Json.NETMITJASIX utility
libdeflateMITBlockCompression library
MoqBSDMocking framework for unit tests
NDesk.OptionsMIT/X11CommandLine library
xUnitApacheUnit testing framework
zlib-ngzlibBlockCompression library
zstdBSDBlockCompression library
+ + \ No newline at end of file diff --git a/introduction/getting-started/index.html b/introduction/getting-started/index.html index 82fa7d65..ee0df419 100644 --- a/introduction/getting-started/index.html +++ b/introduction/getting-started/index.html @@ -6,13 +6,13 @@ Getting Started | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Preserving old data file

By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your SupplementaryAnnotation folder contained ClinVar_20231101.nsa and the latest available version is ClinVar_20231203.nsa, next time the Downloader is run, ClinVar_20231101.nsa will be replaced with ClinVar_20231203.nsa.

Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the SupplementaryAnnotation folder. In other words, the SupplementaryAnnotation folder cannot contain both ClinVar_20231101.nsa and ClinVar_20231203.nsa.

Here is an example of how to proceed if a user doesn't want the latest version of ClinVar.

ls Data/SupplementaryAnnotation/GRCh38
...
ClinGen_disease_validity_curations_20231011.nga
ClinVar_20230930.nsa
ClinVar_20230930.nsa.idx
...
mv Data/SupplementaryAnnotation/GRCh38/ClinVar* <tmp_dir>/GRCh38/

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh38 \
-o Data

rm Data/SupplementaryAnnotation/GRCh38/ClinVar*
mv <tmp_dir>/GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet Nirvana.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--tsv <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--credentialsFile <VALUE>
File path to user credentials, default is set to ~
/.ilmnAnnotations/credentials.json
--ignoreLicenseError ignore error due to invalid license and skip
related data sources
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--disable-junction-preservation
disable junction preserving functional annotation
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--sa-cutoff <VALUE> Any SV larger than or equal to this value will
not have any supplementary annotations
--output-format <VALUE>
output file format, available options: json, vcf.
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

- - +
Version: 3.25 (unreleased)

Getting Started

Illumina Connected Annotations is written in C# using .NET Core (an amazing runtime environment that currently runs on Windows, Linux, Mac OS X, and in Docker images). Once .NET Core has been downloaded, all you need to do is grab the source, compile it, and grab the data files.

tip

Illumina Connected Annotations currently uses .NET6.0. Please make sure that you have the most current runtime from the .NET Core downloads page.

Getting Illumina Connected Annotations

Latest Release

Please visit Illumina Connected Annotations. to obtain the latest release.

mkdir -p IlluminaConnectedAnnotations/Data
cd IlluminaConnectedAnnotations
unzip IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0.zip

Quick Start

If you want to get started right away, we've created a script that unzips the Illumina Connected Annotations build, downloads the annotation data, and starts annotating a test file:

bash ./TestIlluminaConnectedAnnotations.sh IlluminaConnectedAnnotationsBuild.zip

We have verified that this script works on Windows (using Git Bash or WSL), Linux, and Mac OS X.

Docker

Obtain the docker image in a zip file (e.g. IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz), and load it as follows

docker load < IlluminaConnectedAnnotations-3.22.0-0-gc13dcb61-net6.0-docker.tar.gz

If you want to build your own docker image, it is really easy to do. You just need to have Illumina Connected Annotations zip file and then download the Dockerfile and this script.

Put both files (create_docker_image.sh and Dockerfile) inside the same folder.

In terminal, execute command below inside the folder where you put those scripts:

chmod +x create_docker_image.sh
./create_docker_image.sh [path to zip file] [image tag]

After you run the script, the docker image will be available in your local machine with image name illumina-connected-annotations:[image tag specified].

For Docker, we have special instructions for running the Downloader:

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Downloader --ga GRCh37 -o /scratch

Similarly, we have special instructions for running IlluminaConnectedAnnotations (Here's a toy VCF in case you need it):

docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 Annotator -c /scratch/Cache/ \
-r /scratch/References/Homo_sapiens.GRCh37.Nirvana.dat \
--sd /scratch/SupplementaryAnnotation/GRCh37 \
-i /scratch/HiSeq.10000.vcf.gz -o /scratch/HiSeq
caution

Please note that since our data files are usually accessed through a Docker volume, there is a noticeable performance penalty when running Illumina Connected Annotations in Docker.

tip

For convenience, the user is encouraged to create aliases for the docker commands. For example:

alias IlluminaConnectedAnnotations="docker run --rm -it -v local/data/folder:/scratch illumina-connected-annotations:v3.22.0 IlluminaConnectedAnnotations"

Downloading the data files

To download the latest data sources (or update the ones that you already have), use the following command to automate the download from S3:

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh37 \
-o Data
  • the --ga argument specifies the genome assembly which can be GRCh37, GRCh38, or both.
  • the -o argument specifies the output directory
Glitches in the Matrix

Every once in a while, the download process does not go smoothly. Perhaps the internet connection cut out or you ran out of disk space. The Downloader attempts to detect these situations by checking the file sizes at the very end. If you see that a file was marked truncated, try fixing the root cause and running the downloader again.

tip

From time to time, you can re-run the Downloader to get the latest annotation files. It will only download the files that changed.

Preserving old data file

By default, while rerunning, the Downloader will replace old files with the latest versions. For example, if at some point, your SupplementaryAnnotation folder contained ClinVar_20231101.nsa and the latest available version is ClinVar_20231203.nsa, next time the Downloader is run, ClinVar_20231101.nsa will be replaced with ClinVar_20231203.nsa.

Currently, there is no way to override this behavior. If you do not want to replace/update any particular file, we recommend saving those files to a different location, rerun the Downloader to update the other data files and then manually replace the files you did not want updated. Please make sure to remove the latest version of the files you did not want. Note that the Annotator will throw an error if multiple versions of the same data source is present in the SupplementaryAnnotation folder. In other words, the SupplementaryAnnotation folder cannot contain both ClinVar_20231101.nsa and ClinVar_20231203.nsa.

Here is an example of how to proceed if a user doesn't want the latest version of ClinVar.

ls Data/SupplementaryAnnotation/GRCh38
...
ClinGen_disease_validity_curations_20231011.nga
ClinVar_20230930.nsa
ClinVar_20230930.nsa.idx
...
mv Data/SupplementaryAnnotation/GRCh38/ClinVar* <tmp_dir>/GRCh38/

dotnet bin/Release/net6.0/Downloader.dll \
--ga GRCh38 \
-o Data

rm Data/SupplementaryAnnotation/GRCh38/ClinVar*
mv <tmp_dir>/GRCh38/ClinVar* Data/SupplementaryAnnotation/GRCh38/

Download a test VCF file

Here's a toy VCF file you can play around with:

curl -O https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/files/HiSeq.10000.vcf.gz

Running Illumina Connected Annotations

Once you have downloaded the data sets, use the following command to annotate your VCF:

dotnet Annotator.dll \
-c Data/Cache \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000
  • the -c argument specifies the cache directory
  • the --sd argument specifies the supplementary annotation directory
  • the -r argument specifies the compressed reference path
  • the -i argument specifies the input VCF path
  • the -o argument specifies the output filename prefix

When running Illumina Connected Annotations, performance metrics are shown as it evaluates each chromosome in the input VCF file:

---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Initialization Time Positions/s
---------------------------------------------------------------------------
Cache 00:00:00.0
SA Position Scan 00:00:00.0 153,634

Reference Preload Annotation Variants/s
---------------------------------------------------------------------------
chr1 00:00:00.2 00:00:00.8 11,873

Summary Time Percent
---------------------------------------------------------------------------
Initialization 00:00:00.0 1.5 %
Preload 00:00:00.2 4.9 %
Annotation 00:00:00.8 18.5 %

Time: 00:00:04.4

The output will be a JSON file called HiSeq.10000.json.gz. Here's the full JSON file.

The Illumina Connected Annotations command line

The full command line options can be viewed by using the -h option or no options

dotnet Annotator.dll
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet Nirvana.dll -i <vcf path> -c <cache dir> --sd <sa dir> -r <ref path> -o <base output filename>
Annotates a set of variants

OPTIONS:
--cache, -c <directory>
input cache directory
--in, -i <path> input VCF path
--tsv <path> input VCF path
--out, -o <file path> output file path
--ref, -r <path> input compressed reference sequence path
--sd <directory> input supplementary annotation directory
--sources, -s <VALUE> annotation data sources to be used (comma
separated list of supported tags)
--credentialsFile <VALUE>
File path to user credentials, default is set to ~
/.ilmnAnnotations/credentials.json
--ignoreLicenseError ignore error due to invalid license and skip
related data sources
--force-mt forces to annotate mitochondrial variants
--legacy-vids enables support for legacy VIDs
--enable-dq report DQ from VCF samples field
--enable-bidirectional-fusions
enables support for bidirectional gene fusions
--disable-junction-preservation
disable junction preserving functional annotation
--str <VALUE> user provided STR annotation TSV file
--vcf-info <VALUE> additional vcf info field keys (comma separated)
desired in the output
--vcf-sample-info <VALUE>
additional vcf format field keys (comma separated)
desired in the output
--sa-cutoff <VALUE> Any SV larger than or equal to this value will
not have any supplementary annotations
--output-format <VALUE>
output file format, available options: json, vcf.
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

Specifying annotation sources

By default, Illumina Connected Annotations will use all available data sources. However, the user can customize the set of sources using the --sources|-s option. If an unknown source is specified, a warning message will be printed.

dotnet Annotator.dll \
-c Data/Cache/GRCh37 \
--sd Data/SupplementaryAnnotation/GRCh37 \
-r Data/References/Homo_sapiens.GRCh37.Nirvana.dat \
-i HiSeq.10000.vcf.gz \
-o HiSeq.10000 \
-s omim,gnomad,ense
---------------------------------------------------------------------------
Illumina Connected Annotations (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

WARNING: Unknown tag in data-sources: ense.
Available values are: aminoAcidConservation,primateAI,dbsnp,spliceAI,revel,cosmic,clinvar,gnomad,
mitomap,oneKg,gmeVariome,topmed,clingen,decipher,gnomAD-preview,clingenDosageSensitivityMap,
gerpScore,dannScore,omim,clingenGeneValidity,phylopScore,lowComplexityRegion,refMinor,
heteroplasmy,Ensembl,RefSeq

Initialization Time Positions/s
---------------------------------------------------------------------------
SA Position Scan 00:00:00.3 307,966
....
..

The list of available values is compiled from the files provided (using -c and --sd options).

+ + \ No newline at end of file diff --git a/introduction/licensedContent/index.html b/introduction/licensedContent/index.html index 0d232f6f..234412f2 100644 --- a/introduction/licensedContent/index.html +++ b/introduction/licensedContent/index.html @@ -6,12 +6,12 @@ Licensed Content | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Licensed Content

Illumina Conncted Annotations supports following content which is available through a license from Illumina. +

Version: 3.25 (unreleased)

Licensed Content

Illumina Conncted Annotations supports following content which is available through a license from Illumina. The license file will allow users to download and annotate with these data sources.

  • COSMIC
  • OMIM
  • Primate AI-3D
  • Splice AI
tip

License may be customized to allow access to one of more of the above at the time of license creation.

note

The Annotator packaged with DRAGEN comes with a license for all premium contents. That is, if the Annotator is run from within DRAGEN, all premium content will be available. However, this doesn't automatically grant a license to get premium contents while running the Annotator outside of DRAGEN. @@ -19,7 +19,7 @@ These errors may be skipped by using the --ignoreLicenseError command line argument. After doing this, only basic data sources will be used for annotations. This can also be achieved by deleting the credentials file from the home folder.

- - + + \ No newline at end of file diff --git a/introduction/parsing-json/index.html b/introduction/parsing-json/index.html index eea1c608..c8241b84 100644 --- a/introduction/parsing-json/index.html +++ b/introduction/parsing-json/index.html @@ -6,13 +6,13 @@ Parsing Illumina Connected Annotations JSON | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
- - +
Version: 3.25 (unreleased)

Parsing Illumina Connected Annotations JSON

Parsing JSON

Our JSON files are organized similarly to original VCF variants:

Illumina Connected Annotations JSON files can get very large and sometimes we receive feedback that a bioinformatician tried to read the JSON file into Python or R resulting in a program that ran out of available RAM. This happens because those parsers try to load everything into memory all at once.

To get around those issues, we play some clever tricks with newlines that enables our users to parse our JSON files quickly and efficiently.

Organization

Our JSON file is arranged as follows:

  • the header section is located on the first line
  • each line after that corresponds to a position (same as a row in a VCF file)
    • until you reach the genes section ],"genes":[
  • each line after that corresponds to a gene
    • until you reach the end ]}

Knowing this, you can load each position line as an independent JSON object and extract the information you need.

Jupyter Notebook

To demonstrate this, we have put together a Jupyter notebook demonstrating how to do this in Python and a R version as well.

JASIX

One of the tools that we really like in the VCF ecosystem is tabix. Unfortunately, tabix only works for tab-delimited file formats. As a result, we created a similar tool for Illumina Connected Annotations JSON files called JASIX.

Here's an example of how you might use JASIX:

dotnet bin/Release/net6.0/Jasix.dll -i dragen.json.gz -q chr1:942450-942455
  • the -i argument specifies the Illumina Connected Annotations JSON path
  • the -q argument specifies a genomic range (you can use as many of these as you want)

JASIX also includes additional options for showing the Illumina Connected Annotations header or for extracting different sections (like the genes section).

The output from JASIX is compliant JSON object shown in pretty-printed form:

{"positions":[
{
"chromosome": "chr1",
"position": 942451,
"refAllele": "T",
"altAlleles": [
"C"
],
"quality": 484.23,
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.33",
"samples": [
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 21,
"genotypeQuality": 60,
"alleleDepths": [
0,
21
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 32,
"genotypeQuality": 93,
"alleleDepths": [
0,
32
]
},
{
"genotype": "1/1",
"variantFrequencies": [
1
],
"totalDepth": 36,
"genotypeQuality": 105,
"alleleDepths": [
0,
36
]
}
],
"variants": [
{
"vid": "1-942451-T-C",
"chromosome": "chr1",
"begin": 942451,
"end": 942451,
"refAllele": "T",
"altAllele": "C",
"variantType": "SNV",
"hgvsg": "NC_000001.11:g.942451T>C",
"phylopScore": -0.1,
"clinvar": [
{
"id": "VCV000836156.1",
"reviewStatus": "criteria provided, single submitter",
"significance": [
"uncertain significance"
],
"refAllele": "T",
"altAllele": "T",
"lastUpdatedDate": "2020-08-20"
},
{
"id": "RCV001037211.1",
"variationId": 836156,
"reviewStatus": "criteria provided, single submitter",
"alleleOrigins": [
"germline"
],
"refAllele": "T",
"altAllele": "T",
"phenotypes": [
"not provided"
],
"medGenIds": [
"CN517202"
],
"significance": [
"uncertain significance"
],
"lastUpdatedDate": "2020-08-20",
"pubMedIds": [
"28492532"
]
}
],
"dbsnp": [
"rs6672356"
],
"gnomad": {
"coverage": 25,
"allAf": 0.999855,
"allAn": 123742,
"allAc": 123724,
"allHc": 61853,
"afrAf": 0.999416,
"afrAn": 10278,
"afrAc": 10272,
"afrHc": 5133,
"amrAf": 0.99995,
"amrAn": 20008,
"amrAc": 20007,
"amrHc": 10003,
"easAf": 1,
"easAn": 6054,
"easAc": 6054,
"easHc": 3027,
"finAf": 1,
"finAn": 8696,
"finAc": 8696,
"finHc": 4348,
"nfeAf": 0.999899,
"nfeAn": 49590,
"nfeAc": 49585,
"nfeHc": 24790,
"asjAf": 1,
"asjAn": 7208,
"asjAc": 7208,
"asjHc": 3604,
"sasAf": 0.99967,
"sasAn": 18160,
"sasAc": 18154,
"sasHc": 9074,
"othAf": 1,
"othAn": 3748,
"othAc": 3748,
"othHc": 1874,
"maleAf": 0.9999,
"maleAn": 69780,
"maleAc": 69773,
"maleHc": 34883,
"femaleAf": 0.999796,
"femaleAn": 53962,
"femaleAc": 53951,
"femaleHc": 26970,
"controlsAllAf": 0.999815,
"controlsAllAn": 48654,
"controlsAllAc": 48645
},
"oneKg": {
"allAf": 1,
"afrAf": 1,
"amrAf": 1,
"easAf": 1,
"eurAf": 1,
"sasAf": 1,
"allAn": 5008,
"afrAn": 1322,
"amrAn": 694,
"easAn": 1008,
"eurAn": 1006,
"sasAn": 978,
"allAc": 5008,
"afrAc": 1322,
"amrAc": 694,
"easAc": 1008,
"eurAc": 1006,
"sasAc": 978
},
"primateAI": [
{
"hgnc": "SAMD11",
"scorePercentile": 0.87
}
],
"revel": {
"score": 0.145
},
"topmed": {
"allAf": 0.999809,
"allAn": 125568,
"allAc": 125544,
"allHc": 62760
},
"transcripts": [
{
"transcript": "ENST00000420190.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
],
"proteinId": "ENSP00000411579.2"
},
{
"transcript": "ENST00000342066.7",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000342066.7:c.1027T>C",
"hgvsp": "ENSP00000342313.3:p.(Trp343Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000342313.3",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618181.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "732",
"cdsPos": "652",
"exons": "7/11",
"proteinPos": "218",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618181.4:c.652T>C",
"hgvsp": "ENSP00000480870.1:p.(Trp218Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000480870.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000622503.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "1110",
"cdsPos": "1030",
"exons": "10/14",
"proteinPos": "344",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000622503.4:c.1030T>C",
"hgvsp": "ENSP00000482138.1:p.(Trp344Arg)",
"isCanonical": true,
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482138.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000618323.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "712",
"cdsPos": "632",
"exons": "8/12",
"proteinPos": "211",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618323.4:c.632T>C",
"hgvsp": "ENSP00000480678.1:p.(Leu211Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000480678.1",
"siftScore": 0.03,
"siftPrediction": "deleterious - low confidence"
},
{
"transcript": "ENST00000616016.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "ccT/ccC",
"aminoAcids": "P",
"cdnaPos": "944",
"cdsPos": "864",
"exons": "9/13",
"proteinPos": "288",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "ENST00000616016.4:c.864T>C",
"hgvsp": "ENST00000616016.4:c.864T>C(p.(Pro288=))",
"proteinId": "ENSP00000478421.1"
},
{
"transcript": "ENST00000618779.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "921",
"cdsPos": "841",
"exons": "9/13",
"proteinPos": "281",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000618779.4:c.841T>C",
"hgvsp": "ENSP00000484256.1:p.(Trp281Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484256.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000616125.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "783",
"cdsPos": "703",
"exons": "8/12",
"proteinPos": "235",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000616125.4:c.703T>C",
"hgvsp": "ENSP00000484643.1:p.(Trp235Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000484643.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000620200.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "cTg/cCg",
"aminoAcids": "L/P",
"cdnaPos": "427",
"cdsPos": "347",
"exons": "5/9",
"proteinPos": "116",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000620200.4:c.347T>C",
"hgvsp": "ENSP00000484820.1:p.(Leu116Pro)",
"polyPhenScore": 0,
"polyPhenPrediction": "unknown",
"proteinId": "ENSP00000484820.1",
"siftScore": 0.16,
"siftPrediction": "tolerated - low confidence"
},
{
"transcript": "ENST00000617307.4",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "867",
"cdsPos": "787",
"exons": "9/13",
"proteinPos": "263",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000617307.4:c.787T>C",
"hgvsp": "ENSP00000482090.1:p.(Trp263Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000482090.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "NM_152486.2",
"source": "RefSeq",
"bioType": "protein_coding",
"codons": "Cgg/Cgg",
"aminoAcids": "R",
"cdnaPos": "1107",
"cdsPos": "1027",
"exons": "10/14",
"proteinPos": "343",
"geneId": "148398",
"hgnc": "SAMD11",
"consequence": [
"synonymous_variant"
],
"hgvsc": "NM_152486.2:c.1027T>C",
"hgvsp": "NM_152486.2:c.1027T>C(p.(Arg343=))",
"isCanonical": true,
"proteinId": "NP_689699.2"
},
{
"transcript": "ENST00000341065.8",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "750",
"cdsPos": "751",
"exons": "8/12",
"proteinPos": "251",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000341065.8:c.750T>C",
"hgvsp": "ENSP00000349216.4:p.(Trp251Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000349216.4",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000455979.1",
"source": "Ensembl",
"bioType": "protein_coding",
"codons": "Tgg/Cgg",
"aminoAcids": "W/R",
"cdnaPos": "507",
"cdsPos": "508",
"exons": "4/7",
"proteinPos": "170",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"missense_variant"
],
"hgvsc": "ENST00000455979.1:c.507T>C",
"hgvsp": "ENSP00000412228.1:p.(Trp170Arg)",
"polyPhenScore": 0,
"polyPhenPrediction": "benign",
"proteinId": "ENSP00000412228.1",
"siftScore": 1,
"siftPrediction": "tolerated"
},
{
"transcript": "ENST00000478729.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000474461.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "389",
"exons": "3/4",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000474461.1:n.389T>C"
},
{
"transcript": "ENST00000466827.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "191",
"exons": "2/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000466827.1:n.191T>C"
},
{
"transcript": "ENST00000464948.1",
"source": "Ensembl",
"bioType": "retained_intron",
"cdnaPos": "286",
"exons": "1/2",
"geneId": "ENSG00000187634",
"hgnc": "SAMD11",
"consequence": [
"non_coding_transcript_exon_variant"
],
"hgvsc": "ENST00000464948.1:n.286T>C"
},
{
"transcript": "NM_015658.3",
"source": "RefSeq",
"bioType": "protein_coding",
"geneId": "26155",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "NP_056473.2"
},
{
"transcript": "ENST00000483767.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000327044.6",
"source": "Ensembl",
"bioType": "protein_coding",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
],
"isCanonical": true,
"proteinId": "ENSP00000317992.6"
},
{
"transcript": "ENST00000477976.5",
"source": "Ensembl",
"bioType": "retained_intron",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
},
{
"transcript": "ENST00000496938.1",
"source": "Ensembl",
"bioType": "processed_transcript",
"geneId": "ENSG00000188976",
"hgnc": "NOC2L",
"consequence": [
"downstream_gene_variant"
]
}
]
}
]
}
]}
+ + \ No newline at end of file diff --git a/search/index.html b/search/index.html index 055a820e..09405c0f 100644 --- a/search/index.html +++ b/search/index.html @@ -6,13 +6,13 @@ Search the documentation | IlluminaConnectedAnnotations - - + +
-

Search the documentation

- - +

Search the documentation

+ + \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 5879ec8d..86e9aa8c 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1 +1 @@ -https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/blog/archiveweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/searchweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/versionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautilsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContentweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautilsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preservingweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-previewweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-updateweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContentweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/utilities/sautilsweekly0.5 \ No newline at end of file +https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/blog/archiveweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/searchweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/versionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.22/utilities/sautilsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/phylopprimate-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/licensedContentweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.23/utilities/sautilsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/iscn-notationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/junction-preservingweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-previewweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/clinvar-preview-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad4.0-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/gnomad40-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/phylopprimate-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/file-formats/illumina-annotator-vcf-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/frequently-asked-questions/Annotator-vs-data-updateweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/licensedContentweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/3.24/utilities/sautilsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/weekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/canonical-transcriptsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/gene-fusionsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/iscn-notationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/junction-preservingweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/transcript-consequence-impactsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/core-functionality/variant-idsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-snv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/1000Genomes-sv-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservationweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/amino-acid-conservation-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cancer-hotspotsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingenweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-dosage-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-gene-validity-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clingen-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvarweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-previewweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/clinvar-preview-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmicweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-cancer-gene-censusweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-gene-fusion-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/cosmic-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dannweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dann-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/dbsnp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/decipherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/decipher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcherweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/fusioncatcher-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gerpweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gerp-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gmeweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gme-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomadweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-data_descriptionweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-lof-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad4.0-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/gnomad40-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mito-heteroplasmyweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomapweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-small-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/mitomap-structural-variants-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/omimweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/omim-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylop-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/phylopprimate-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/primate-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/revelweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/revel-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-aiweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/splice-ai-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/topmedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/data-sources/topmed-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/custom-annotationsweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-json-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/file-formats/illumina-annotator-vcf-file-formatweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/frequently-asked-questions/Annotator-vs-data-updateweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/dependenciesweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/getting-startedweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/licensedContentweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/introduction/parsing-jsonweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/utilities/jasixweekly0.5https://illumina.github.io/IlluminaConnectedAnnotationsDocumentation/utilities/sautilsweekly0.5 \ No newline at end of file diff --git a/utilities/jasix/index.html b/utilities/jasix/index.html index 925c8261..50fdad07 100644 --- a/utilities/jasix/index.html +++ b/utilities/jasix/index.html @@ -6,13 +6,13 @@ Jasix | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
- - +
Version: 3.25 (unreleased)

Jasix

Overview

The Jasix index is aimed at providing TABIX like indexing capabilities for the Illumina Connected Annotations JSON output.

Creating the Jasix index

The Jasix index (that comes in a .jsi) file is generated on-the-fly with Illumina Connected Annotations output. It can also be generated independently by running the Jasix command line utility on the JSON output file. Please note that the Jasix utility can only consume JSON files that follow the Illumina Connected Annotations JSON output format. The following code blocks demonstrate the help menu and index generating functionalities of Jasix.

Example

dotnet Jasix.dll -h
USAGE: dotnet Jasix.dll -i in.json.gz [options]
Indexes a Illumina Connected Annotations annotated JSON file

OPTIONS:
--header, -t print also the header lines
--only-header, -H print only the header lines
--chromosomes, -l list chromosome names
--index, -c create index
--in, -i <VALUE> input
--out, -o <VALUE> compressed output file name (default:console)
--query, -q <VALUE> query range
--section, -s <VALUE> complete section (positions or genes) to output
--help, -h displays the help menu
--version, -v displays the version
dotnet Jasix.dll --index -i input.json.gz
---------------------------------------------------------------------------
Jasix (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Ref Sequence chrM indexed in 00:00:00.2
Ref Sequence chr1 indexed in 00:00:05.8
Ref Sequence chr2 indexed in 00:00:06.0
.
.
.
Peak memory usage: 28.5 MB
Time: 00:01:14.8

Querying the index

The Jasix query format is chr:start-end. If not provided, it assumes end=start. If only chr is provided, all entries for that chromosome will be provided.

dotnet Jasix.dll -i input.json.gz chrM:5000-7000
{
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
}
]
}

The default output stream is Console. However, if an output filename is provided, Jasix outputs the results to that file in a bgzip compressed format. The output is always a valid JSON entry. If requested (via -t option) the header of the indexed file will be provided. Multiple queries can be submitted in the same command and the output will contain them within the same "positions" block in order of the submitted queries (Warning: if the queries are out of order, or overlapping, the output will be out or order and intersecting).

dotnet Jasix.dll -i input.json.gz  -q chrM:5000-7000 -q chrM:8500-9500 -t
{
"header":{
"annotator":"Illumina Annotation Engine 1.6.2.0",
"creationTime":"2017-08-30 11:42:57",
"genomeAssembly":"GRCh37",
"schemaVersion":6,
"dataVersion":"84.24.39",
"dataSources":[
{
"name":"VEP",
"version":"84",
"description":"Ensembl",
"releaseDate":"2017-01-16"
}
],
"samples":[
"Mother"
]
},
"positions":[
{
"chromosome":"chrM",
"refAllele":"C",
"position":5581,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"T"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1625,
"genotypeQuality":1,
"alleleDepths":[
0,
1625
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"T",
"refAllele":"C",
"begin":5581,
"chromosome":"chrM",
"end":5581,
"variantType":"SNV",
"vid":"MT:5581:T"
}
]
},
{
"chromosome":"chrM",
"refAllele":"A",
"position":6267,
"quality":1637.00,
"filters":[
"LowGQXHetSNP"
],
"altAlleles":[
"G"
],
"samples":[
{
"variantFreq":0.6873,
"totalDepth":323,
"genotypeQuality":1,
"alleleDepths":[
101,
222
],
"genotype":"0/1"
}
],
"variants":[
{
"altAllele":"G",
"refAllele":"A",
"begin":6267,
"chromosome":"chrM",
"end":6267,
"variantType":"SNV",
"vid":"MT:6267:G"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":8702,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":0.9987,
"totalDepth":1534,
"genotypeQuality":1,
"alleleDepths":[
2,
1532
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":8702,
"chromosome":"chrM",
"end":8702,
"variantType":"SNV",
"vid":"MT:8702:A"
}
]
},
{
"chromosome":"chrM",
"refAllele":"G",
"position":9378,
"quality":3070.00,
"filters":[
"LowGQXHomSNP"
],
"altAlleles":[
"A"
],
"samples":[
{
"variantFreq":1,
"totalDepth":1018,
"genotypeQuality":1,
"alleleDepths":[
0,
1018
],
"genotype":"1/1"
}
],
"variants":[
{
"altAllele":"A",
"refAllele":"G",
"begin":9378,
"chromosome":"chrM",
"end":9378,
"variantType":"SNV",
"vid":"MT:9378:A"
}
]
}
]
}

Extracting a section

The Illumina Connected Annotations JSON file has three sections: header, positions and genes. Header can be printed using the -H option. If you are interested in only the positions or genes section, you can use the -s or --section option.

dotnet Jasix.dll -i input.json.gz  -s genes
[
{
"name": "ABCB10",
"omim": [
{
"mimNumber": 605454,
"geneName": "ATP-binding cassette, subfamily B, member 10"
}
]
},
{
"name": "ABCD3",
"omim": [
{
"mimNumber": 170995,
"geneName": "ATP-binding cassette, subfamily D, member 3 (peroxisomal membrane protein 1, 70kD)",
"description": "The ABCD3 gene encodes a peroxisomal membrane transporter involved in the transport of branched-chain fatty acids and C27 bile acids into the peroxisome; the latter function is a crucial step in bile acid biosynthesis (summary by Ferdinandusse et al., 2015).",
"phenotypes": [
{
"mimNumber": 616278,
"phenotype": "?Bile acid synthesis defect, congenital, 5",
"mapping": "molecular basis of the disorder is known",
"inheritances": [
"Autosomal recessive"
],
"comments": [
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
}
]
+ + \ No newline at end of file diff --git a/utilities/sautils/index.html b/utilities/sautils/index.html index 4335479c..dc2aafd8 100644 --- a/utilities/sautils/index.html +++ b/utilities/sautils/index.html @@ -6,17 +6,17 @@ SAUtils | IlluminaConnectedAnnotations - - + +
-
Version: 3.24 (unreleased)

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema

SAUtils AutoDownloadGenerate

To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands. +

Version: 3.25 (unreleased)

SAUtils

Overview

SAUtils is a utility tool that creates binary supplementary annotation files (.nsa, .gsa, .npd, .nsi, etc.) from original data files (e.g. VCFs, TSVs, XML, HTML, etc.) for various data sources (e.g. ClinVar, dbSNP, gnomAD, etc.). These binary files can be fed into the Illumina Connected Annotations Annotation engine to provide supplementary annotations in the output.

The SAUtils Menu

SAUtils supports building binary files for many data sources. The help menu lists them out in the form of sub-commands.

dotnet SAUtils.dll
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

Utilities focused on supplementary annotation

USAGE: dotnet SAUtils.dll <command> [options]

COMMAND: AutoDownloadGenerate auto download and generate Omim, Clinvar, Clingen
AaCon create AA conservation database
ancestralAllele create Ancestral allele database from 1000Genomes data
ClinGen create ClinGen database
Downloader download ClinGen database
clinvar create ClinVar database
concat merge multiple NSA files for the same data source having non-overlapping regions
Cosmic create COSMIC database
CosmicSv create COSMIC SV database
CosmicFusion create COSMIC gene fusion database
CosmicCGC create COSMIC cancer gene census database
CustomGene create custom gene annotation database
CustomVar create custom variant annotation database
Dann create DANN database
Dbsnp create dbSNP database
Dgv create DGV database
DiseaseValidity create disease validity database
DosageMapRegions create dosage map regions
DosageSensitivity create dosage sensitivity database
DownloadOmim download OMIM database
ExtractMiniSA extracts mini SA
ExtractMiniXml extracts mini XML (ClinVar)
FilterSpliceNetTsv filter SpliceNet predictions
FusionCatcher create FusionCatcher database
Gerp create GERP conservation database
GlobalMinor create global minor allele database
Gnomad create gnomAD database
Gnomad-lcr create gnomAD low complexity region database
GnomadGeneScores create gnomAD gene scores database
GnomadSV create gnomAD structural variant database
Index edit an index file
MitoHet create mitochondrial Heteroplasmy database
MitomapSvDb create MITOMAP structural variants database
MitomapVarDb create MITOMAP small variants database
Omim create OMIM database
OneKGen create 1000 Genome small variants database
OneKGenSv create 1000 Genomes structural variants database
OneKGenSvVcfToBed convert 1000 Genomes structural variants VCF file into a BED-like file
PhyloP create PhyloP database
PrimateAi create PrimateAI database
RefMinor create Reference Minor database from 1000 Genome
RemapWithDbsnp remap a VCF file given source and destination rsID mappings
Revel create REVEL database
SpliceAi create SpliceAI database
TopMed create TOPMed database
Gme create GME Variome database
Decipher create Decipher database

You can get further detailed help for each sub-command by typing in the subcommand. For example:

dotnet SAUtils.dll clinvar
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
3.22.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll clinvar [options]
Creates a supplementary database with ClinVar annotations

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--rcv, -i <VALUE> ClinVar Full release XML file
--vcv, -c <VALUE> ClinVar Variation release XML file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version

More detailed instructions about each sub-command can be found in documentation of respective data sources.

Output File Formats

The format of the binary file SAUtils produce depend on the type of annotation data represented in that file (e.g. small variant vs. structural variants vs. genes).

File ExtensionDescription
.nsaSmall variant annotations (e.g. SNV, insertions, deletions, etc.)
.gsaCompact variant annotations (e.g. SNV, insertions, deletions, etc.)
.idxIndex file
.nsiInterval annotations (e.g. SV, CNVs, intervals)
.ngaGene annotations
.npdConservation scores
.rmaReference Minor allele
.gfsGene fusions source
.gfjGene fusions JSON
.schemaJSON schema

SAUtils AutoDownloadGenerate

To make generating supplementary annotation files easier, we have provided an easier command that can be use instead of more granular subcommands. This subcommands basically integrate both download and generate subcommand. Currently, this subcommand support several data sources:

  • ClinVar
  • ClinGen
  • dbSNP
  • OMIM
  • COSMIC
dotnet SAUtils.dll AutoDownloadGenerate
---------------------------------------------------------------------------
SAUtils (c) 2024 Illumina, Inc.
3.23.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll autodownloadgenerate [options]
Downloads and generates the Supplementary Database for Omim, ClinGen, ClinVar, dbSNP, and COSMIC

OPTIONS:
--sources, -s <VALUE> comma separated list of external data sources
--inputJson, -j <path> input JSON path
--downloadBaseFolder, -b <directory>
base directory path external datasources
downloaded to
--downloadDate, -d <directory>
date directory name that external datasources
downloaded to. Default is today's date in yyyy-
MM-dd format (e.g. 2023-01-30).
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output SA directory
--actions, -a <VALUE> comma separated list of action(s) to perform.
action options: download, generate.
--help, -h displays the help menu
--version, -v displays the version

You can download only, generate only, or both download and generate supplementary files. To use this subcommands, you have to prepare a json file that will be used as data sources information. Below is tutorial to use this subcommand to generate each data source.

AutoDownloadGenerate ClinVar

Below is the command to use AutoDownloadGenerate for ClinVar to download and generate supplementary files.

dotnet SAUtils.dll AutoDownloadGenerate -s ClinVar -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinVar should be formatted like below:

{
"clinvar": {
"baseDirectory": "ClinVar",
"sourceFiles": [
{
"name": "ClinVar",
"description": "A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"files": [
{
"localFileName": "ClinVarFullRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/ClinVarFullRelease_00-latest.xml.gz.md5"
},
{
"localFileName": "ClinVarVariationRelease_00-latest.xml.gz",
"downloadUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz",
"md5Url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/ClinVarVariationRelease_00-latest.xml.gz.md5"
}
]
}
]
}
}

There is no need to modify the json entry for ClinVar and you can use as it is.

AutoDownloadGenerate ClinGen

dotnet SAUtils.dll AutoDownloadGenerate -s ClinGen -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for ClinGen should be formatted like below:

{
"clingen": {
"baseDirectory": "ClinGen",
"sourceFiles": [
{
"name": "ClinGen Dosage Sensitivity Map",
"subDirectory": "DosageSensitivity",
"description": "Dosage sensitivity map from ClinGen (dbVar)",
"files": [
{
"localFileName": "ClinGen_gene_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_gene_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_gene_curation_list_GRCh38.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh37.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh37.tsv"
},
{
"localFileName": "ClinGen_region_curation_list_GRCh38.tsv",
"downloadUrl": "https://ftp.clinicalgenome.org/ClinGen_region_curation_list_GRCh38.tsv"
}
]
},
{
"name": "ClinGen disease validity curations",
"subDirectory": "GeneDiseaseValidity",
"description": "Disease validity curations from ClinGen (dbVar)",
"files": [
{
"localFileName": "Clingen-Gene-Disease-Summary.csv",
"downloadUrl": "https://search.clinicalgenome.org/kb/gene-validity/download"
}
]
}
]
}
}

There is no need to modify the json entry for ClinGen and you can use as it is.

AutoDownloadGenerate dbSNP

dotnet SAUtils.dll AutoDownloadGenerate -s dbSNP -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for dbSNP should be formatted like below:

{
"dbsnp": {
"baseDirectory": "dbSNP",
"sourceFiles": [
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh37",
"files": [
{
"localFileName": "GCF_000001405.25.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.25.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.md5"
}
]
},
{
"name": "dbSNP",
"description": "Identifiers for observed variants",
"version": "156",
"subDirectory": "GRCh38",
"files": [
{
"localFileName": "GCF_000001405.40.gz.tbi",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi.md5"
},
{
"localFileName": "GCF_000001405.40.gz",
"downloadUrl": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz",
"md5Url": "https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.md5"
}
]
}
]
}
}

The json above is examplke for dbSNP version 156. If you want to use it for different version, adjust the version number and all entries in files to use the actual URL. If you only want to generate GRCh38, just remove the GRCh37 entries in the sourceFiles.

AutoDownloadGenerate OMIM

dotnet SAUtils.dll AutoDownloadGenerate -s OMIM -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for OMIM should be formatted like below:

{
"omim": {
"baseDirectory": "omim",
"sourceFiles": [
{
"name": "OMIM",
"description": "An Online Catalog of Human Genes and Genetic Disorders"
}
]
}
}

There is no need to modify the json entry for OMIM and you can use as it is. You have to export OMIM API key as environment variable with name OmimApiKey.

AutoDownloadGenerate COSMIC

dotnet SAUtils.dll AutoDownloadGenerate -s COSMIC -a download,generate -j [path to json] -c [path to cache folder] -r [path to reference folder] -b [path to folder to store downloaded files] -o [path to output folder]

The json file for COSMIC should be formatted like below:

{
"Cosmic": {
"baseDirectory": "COSMIC",
"sourceFiles": [
{
"name": "COSMIC",
"version": "99",
"description": "the Catalogue Of Somatic Mutations In Cancer"
}
]
}
}

You have to adjust the version entry according to which COSMIC version you want. You also need to have COSMIC credentials and export it as environment variable with name Cosmic_Username and Cosmic_Password

- - + + \ No newline at end of file diff --git a/versions/index.html b/versions/index.html index cc72a6ff..f6fc5331 100644 --- a/versions/index.html +++ b/versions/index.html @@ -6,13 +6,13 @@ Versions | IlluminaConnectedAnnotations - - + +
-

Illumina Connected Annotations documentation versions

Current version (Stable)

Here you can find the documentation for current released version.

3.23Documentation

Next version (Unreleased)

Here you can find the documentation for unreleased version currently in development.

3.24 (unreleased)Documentation

Past versions

Here you can find documentation for previous versions.

3.22Documentation
- - +

Illumina Connected Annotations documentation versions

Current version (Stable)

Here you can find the documentation for current released version.

3.24Documentation

Next version (Unreleased)

Here you can find the documentation for unreleased version currently in development.

3.25 (unreleased)Documentation

Past versions

Here you can find documentation for previous versions.

3.23Documentation
3.22Documentation
+ + \ No newline at end of file