Skip to content

Commit

Permalink
Update CNV-annotation-standards.md
Browse files Browse the repository at this point in the history
Clean VRS 1.3
  • Loading branch information
mbaudis committed Nov 22, 2023
1 parent f8e3f02 commit 7f239d8
Showing 1 changed file with 22 additions and 31 deletions.
53 changes: 22 additions & 31 deletions docs/resources/CNV-annotation-standards.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,19 @@
---
title: CNV Annotation Formats
description: Some Information about CNV Annotation Standards
dtemplate: post.html
ate: 2023-02-01
date: 2023-11-22
authors:
- "@mbaudis"
---

With the "dual origin" in cytogenetics ("chromosome based") and genomics ("sequencing
based") analyses the annotation of copy number variants has evolved starting from
different directions. This page summarizes some of the common annotation schemes,
terminologies and file formats which have some application to genomic copy number
variations.<!--more-->

## Cytogenetics vs. Molecular Biology...

From the cytogenetic side the use of cytogenetic bands as
With the "dual origin" in cytogenetics ("chromosome based") and genomics ("sequencing
based") analyses the annotation of copy number variants has evolved starting from
different directions. From the cytogenetic side the use of cytogenetic bands as
coordinate system, has been amended by increasing use of mapping positions (i.e.
for molecular-cytogenetic or hybrid analyses with known probe positions) while
for array and sequencing based CNV detection an increasing focus lies in the
for array and sequencing based CNV detection <!--more-->an increasing focus lies in the
determination of discrete allelic copy number counts and the assignment of a limited
set of CNV classes reflecting common use concepts.

Expand All @@ -28,19 +23,19 @@ set of CNV classes reflecting common use concepts.
This table is maintained in parallel with the [Beacon v2 documentation](http://docs.genomebeacons.org/variant-queries/#term-use-comparison).
<!--more-->

| [EFO](http://www.ebi.ac.uk/efo/EFO_0030063) | Beacon | [VCF](https://samtools.github.io/hts-specs/) | SO | GA4GH [VRS](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr;<br/>[VRS proposal](https://github.com/ga4gh/vrs/issues/404)[^1] | Notes |
| [EFO](http://www.ebi.ac.uk/efo/EFO_0030063) | Beacon | [VCF](https://samtools.github.io/hts-specs/) | SO | GA4GH [VRS](https://vrs.ga4gh.org/en/latest/terms_and_model.html#copynumberchange)[^1] | Notes |
| ------------------------------------------- | ------------------------------------------------------------------------------ | -------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
| <nobr>[`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070)</nobr> copy number gain | `DUP`[^2] or<br/><nobr>[`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | [`low-level gain`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) (implicit) &rArr; [`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070) copy&nbsp;number&nbsp;gain | a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence |
| [`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071) low-level copy number gain| `DUP`[^2] or<br/><nobr>[`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | [`low-level gain`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071) low-level copy number gain | |
| [`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072) high-level copy number gain | `DUP`[^2] or<br/><nobr>[`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | [`high-level gain`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072) high-level copy number gain | commonly but not consistently used for >=5 copies on a bi-allelic genome region |
| [`EFO:0030073`](http://www.ebi.ac.uk/efo/EFO_0030073) focal genome amplification | `DUP`[^2] or<br/><nobr>[`EFO:0030073`](http://www.ebi.ac.uk/efo/EFO_0030073)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | [`high-level gain`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0030073`](http://www.ebi.ac.uk/efo/EFO_0030073) focal genome amplification | commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb) |
| [`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067) copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | [`partial loss`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) (implicit) &rArr; [`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067) copy number loss | a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence |
| [`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068) low-level copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | [`partial loss`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068) low-level copy number loss | |
| [`EFO:0020073`](http://www.ebi.ac.uk/efo/EFO_0020073) high-level copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0020073`](https://github.com/EBISPOT/efo/issues/1941)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | [`partial loss`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0020073`](https://github.com/EBISPOT/efo/issues/1941) high-level copy number loss | a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted |
| [`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069) complete genomic deletion | `DEL`[^2] or<br/><nobr>[`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | [`complete loss`](https://vrs.ga4gh.org/en/latest/terms_and_model.html#relativecopynumber) &rArr; [`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069) complete genomic deletion | complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region) |

##### Last updated 2023-03-22 by @mbaudis (EFO:0020073)
##### updated 2023-03-20 by @mbaudis (VRS proposal)
| <nobr>[`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070)</nobr> copy number gain | `DUP`[^2] or<br/><nobr>[`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | <nobr>[`EFO:0030070`](http://www.ebi.ac.uk/efo/EFO_0030070) gain | a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence |
| [`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071) low-level copy number gain| `DUP`[^2] or<br/><nobr>[`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | <nobr>[`EFO:0030071`](http://www.ebi.ac.uk/efo/EFO_0030071)</nobr> low-level gain | |
| [`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072) high-level copy number gain | `DUP`[^2] or<br/><nobr>[`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | <nobr>[`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072)</nobr> high-level gain | commonly but not consistently used for >=5 copies on a bi-allelic genome region |
| [`EFO:0030073`](http://www.ebi.ac.uk/efo/EFO_0030073) focal genome amplification | `DUP`[^2] or<br/><nobr>[`EFO:0030073`](http://www.ebi.ac.uk/efo/EFO_0030073)</nobr> | `DUP`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001742`](http://www.sequenceontology.org/browser/current_release/term/SO:0001742) copy_number_gain | <nobr>[`EFO:0030072`](http://www.ebi.ac.uk/efo/EFO_0030072)</nobr> high-level gain[^4] | commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb) |
| [`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067) copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | <nobr>[`EFO:0030067`](http://www.ebi.ac.uk/efo/EFO_0030067)</nobr> loss | a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence |
| [`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068) low-level copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | <nobr>[`EFO:0030068`](http://www.ebi.ac.uk/efo/EFO_0030068)</nobr> low-level loss | |
| [`EFO:0020073`](http://www.ebi.ac.uk/efo/EFO_0020073) high-level copy number loss | `DEL`[^2] or<br/><nobr>[`EFO:0020073`](https://github.com/EBISPOT/efo/issues/1941)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | <nobr>[`EFO:0020073`](https://github.com/EBISPOT/efo/issues/1941)</nobr> high-level loss | a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted |
| [`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069) complete genomic deletion | `DEL`[^2] or<br/><nobr>[`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069)</nobr> | `DEL`<br/><nobr>`SVCLAIM=D`[^3]</nobr> | [`SO:0001743`](http://www.sequenceontology.org/browser/current_release/term/SO:0001743) copy_number_loss | <nobr>[`EFO:0030069`](http://www.ebi.ac.uk/efo/EFO_0030069)</nobr> complete genomic loss | complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region) |

##### Last updated 2023-03-22 by @mbaudis (VRS 1.3 adjustment)
##### updated 2023-03-22 by @mbaudis (EFO:0020073) & 2023-03-20 by @mbaudis (VRS proposal)

## ISCN

Expand Down Expand Up @@ -71,11 +66,10 @@ microarrays and DNA sequencing.

While VCF is a file format, originally developed (and optimised) for the
representation of possibly recurring variants across a set of analyses, it also
allows for the storage & representation of CNV events[^3].
allows for the storage & representation of CNV events.

### Links

* current VCF specification [v4.4 PDF](https://samtools.github.io/hts-specs/VCFv4.4.pdf)
* VCF specification [v4.2 PDF](https://samtools.github.io/hts-specs/VCFv4.2.pdf)


Expand Down Expand Up @@ -144,12 +138,9 @@ The _Progenetix_ data serves as the repository behind the

* schema in _progenetix/bycon_ [code repository](https://github.com/progenetix/bycon/blob/master/schemas/src/progenetix-database-schemas/pgxVariant.yaml)



[^1]: The VRS annotations refer to the status at v1.2 (2022). The GA4GH VRS team
is currently (Spring 2023) preparing an updated specification which will introduce
the new class `CopyNumberChange` ([discussion...](https://github.com/ga4gh/vrs/issues/404#issuecomment-1472599849)) with the use of the EFO terms (including a new term
for `high level deletion (EFO:0020073)` in the April 2023 EFO release).
[^1]: The VRS annotations refer to the status from v1.3 (2022) when
the new class `CopyNumberChange` ([discussion...](https://github.com/ga4gh/vrs/issues/404#issuecomment-1472599849))
with the use of the EFO terms.
[^2]: While the use of VCF derived (`DUP`, `DEL`) values had been introduced with
beacon v1, usage of these terms has always been a _recommendation_ rather than an integral part
of the API. We now encourage the support of more specific terms (particularly EFO)
Expand All @@ -158,5 +149,5 @@ provides an internal term expansion for legacy `DUP`, `DEL` support.
[^3]: VCFv4.4 introduces an `SVCLAIM` field to disambiguate between _in situ_ events (such as
tandem duplications; known _adjacency_/ _break junction_: `SVCLAIM=J`) and events where e.g. only the
change in _abundance_ / _read depth_ (`SVCLAIM=D`) has been determined. Both **J** and **D** flags can be combined.

[^4]: VRS did not adopt the "amplification" term due to possible inconsistencies

0 comments on commit 7f239d8

Please sign in to comment.