Skip to content

Commit

Permalink
version bump and adding some docs
Browse files Browse the repository at this point in the history
  • Loading branch information
will-rowe committed Aug 20, 2020
1 parent b9b30ad commit d6af063
Show file tree
Hide file tree
Showing 4 changed files with 105 additions and 32 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ project(ARTIC)
set(ARTIC_PROG_NAME artic-tools)
set(ARTIC_VERSION_MAJOR 0)
set(ARTIC_VERSION_MINOR 2)
set(ARTIC_VERSION_PATCH 0)
set(ARTIC_VERSION_PATCH 1)
configure_file (
"${PROJECT_SOURCE_DIR}/artic/version.hpp.in"
"${PROJECT_SOURCE_DIR}/artic/version.hpp"
Expand Down
71 changes: 67 additions & 4 deletions docs/commands.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Commands

***
---

## align_trim

Expand All @@ -22,12 +22,75 @@ Example usage:
artic-tools validate_scheme primerscheme.bed
```

It reports some basic stats and can also be used to produce a multifasta of all your primer sequences. Example output looks like this:

```
primer scheme file: myscheme.bed
primer scheme version: 3
reference sequence ID: MN908947.3
number of pools: 2
number of primers: 218 (includes 22 alts)
number of amplicons: 98
mean amplicon size: 393
scheme ref. span: 30-29866
scheme overlaps: 29.1326%
primer sequences: primers.fasta
```

## check_vcf

The `check_vcf` command is used to check a VCF file and filter variants into PASS and FAIL VCF files.
The `check_vcf` command is used to check a VCF file and to (optionally) filter variants into a PASS VCF file.

Example usage:

```
artic-tools check_vcf primerscheme.bed in.vcf
```
artic-tools check_vcf --dropPrimerVars --dropOverlapFails -o pass.vcf primerscheme.bed in.vcf
```

Example ouput:

```
[14:13:50] artic-tools::vcfchecker: starting VCF checker
[14:13:50] artic-tools::vcfchecker: filtering variants: true
[14:13:50] artic-tools::vcfchecker: output file: pass.vcf
[14:13:50] artic-tools::vcfchecker: discard primer site vars: true
[14:13:50] artic-tools::vcfchecker: discard overlap fail vars: true
[14:13:50] artic-tools::vcfchecker: variant at pos 241: C->T
[14:13:50] artic-tools::vcfchecker: variant at pos 3037: C->T
[14:13:50] artic-tools::vcfchecker: variant at pos 12733: C->T
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: nothing seen at position yet, holding var
[14:13:50] artic-tools::vcfchecker: variant at pos 12733: C->T
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: multiple copies of var found at pos 12733 in overlap region
[14:13:50] artic-tools::vcfchecker: variant at pos 14408: C->T
[14:13:50] artic-tools::vcfchecker: variant at pos 22863: TA->T
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: nothing seen at position yet, holding var
[14:13:50] artic-tools::vcfchecker: variant at pos 22868: TG->T
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: var pos does not match with that of previously identified overlap, holding var (and dropping held var at 22862)
[14:13:50] artic-tools::vcfchecker: variant at pos 22896: T->TTGG
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: var pos does not match with that of previously identified overlap, holding var (and dropping held var at 22867)
[14:13:50] artic-tools::vcfchecker: variant at pos 22909: TA->T
[14:13:50] artic-tools::vcfchecker: variant at pos 22913: T->C
[14:13:50] artic-tools::vcfchecker: variant at pos 22916: CT->TC
[14:13:50] artic-tools::vcfchecker: variant at pos 22926: T->TAA
[14:13:50] artic-tools::vcfchecker: variant at pos 22948: ACC->A
[14:13:50] artic-tools::vcfchecker: variant at pos 22995: C->CA
[14:13:50] artic-tools::vcfchecker: variant at pos 22997: C->T
[14:13:50] artic-tools::vcfchecker: variant at pos 23009: G->GT
[14:13:50] artic-tools::vcfchecker: variant at pos 23057: C->A
[14:13:50] artic-tools::vcfchecker: variant at pos 23098: AC->GT
[14:13:50] artic-tools::vcfchecker: variant at pos 23183: T->TC
[14:13:50] artic-tools::vcfchecker: located within an amplicon overlap region
[14:13:50] artic-tools::vcfchecker: var pos does not match with that of previously identified overlap, holding var (and dropping held var at 22895)
[14:13:50] artic-tools::vcfchecker: variant at pos 23403: A->G
[14:13:50] artic-tools::vcfchecker: variant at pos 27752: C->T
[14:13:50] artic-tools::vcfchecker: variant at pos 28881: GGG->AAC
[14:13:50] artic-tools::vcfchecker: finished checking
[14:13:50] artic-tools::vcfchecker: dropped var at 23182 which is in an amplicon overlap region but was only found once
[14:13:50] artic-tools::vcfchecker: 22 variant records processed
[14:13:50] artic-tools::vcfchecker: 18 variant records passed checks
```
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

A set of tools for working with the ARTIC pipeline.

***
---

## Introduction

The ARTIC pipeline (available [here](https://github.com/artic-network/fieldbioinformatics)) is a bioinformatics pipeline for working with virus sequencing data produced using nanopore. We've been working on `artic-tools` as a complimentary set of utilities for helping with amplicon sequencing workflows and plan to incorporate them into the ARTIC pipeline.

## Further Reading

* [Installation](./installation.md)
* [Commands](./commands.md)
- [Installation](./installation.md)
- [Commands](./commands.md)
57 changes: 33 additions & 24 deletions docs/primerscheme.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,28 @@
# Primer Scheme
# Primer Schemes

Supported primer schemes are found in the [ARTIC repos](https://github.com/artic-network).
Supported primer schemes are found in the [ARTIC repos](https://github.com/artic-network). They are in a derivative BED format where the first 4 columns are true to format and the 5th column is hijacked for providing primer pool information.

Primer schemes are in BED format and `0-based, half-open`!
This means that ARTIC primer schemes are in 4-column BED format (`0-based, half-open`) plus an extra column for primer pool!

This doc page is a work in progress...
Example:

```
MN908947.3 30 54 nCoV-2019_1_LEFT nCoV-2019_1
MN908947.3 385 410 nCoV-2019_1_RIGHT nCoV-2019_1
MN908947.3 320 342 nCoV-2019_2_LEFT nCoV-2019_2
MN908947.3 704 726 nCoV-2019_2_RIGHT nCoV-2019_2
....
```

## Primer processing

The following tags are required to exist at the end of the primer IDs:

| tag | meaning |
| ------ | -------------------- |
| _LEFT | the left primer |
| _RIGHT | the right primer |
| _alt | the primer is an alt |
| tag | meaning |
| ------- | -------------------- |
| \_LEFT | the left primer |
| \_RIGHT | the right primer |
| \_alt | the primer is an alt |

For example:

Expand All @@ -32,29 +40,30 @@ A canonical primer ID is an ID where all tags have been removed. So in the above

TODO: more info on primer processing logic


## Scheme Validation
## Scheme validation

The primer schemes are read from file and validated.

On reading from file, the following must be true:

* file must exist and be readable
* a recognised primer scheme version must be provided
* must be TSV with correct column count correct for scheme version
* must not contain multiple reference sequence IDs
* each row must encode a primer (problem rows are flagged and validation fails after all rows are tried)
- file must exist and be readable
- a recognised primer scheme version must be provided
- must be TSV with correct column count correct for scheme version
- must not contain multiple reference sequence IDs
- each row must encode a primer (problem rows are flagged and validation fails after all rows are tried)

Once the file processed for primers, the following checks are made:

* check there are primers in the scheme
* check the number of forward primers match the number of reverse primers (this is **after** merging alts)
* check forward and reverse primers make proper amplicons (based on shared canonical primer IDs)
* check there are no gaps in the scheme
- check there are primers in the scheme
- check the number of forward primers match the number of reverse primers (this is **after** merging alts)
- check forward and reverse primers make proper amplicons (based on shared canonical primer IDs)
- check there are no gaps in the scheme

The following information can be reported from the scheme:

* number of pools, primers, amplicons etc.
* mean amplicon size
* the scheme span, with respect to the reference co-ordinates
* the scheme amplicon overlaps (i.e. the proportion of the scheme span with >1 amplicon coverage)
- number of pools, primers, amplicons etc.
- mean amplicon size
- the scheme span, with respect to the reference co-ordinates
- the scheme amplicon overlaps (i.e. the proportion of the scheme span with >1 amplicon coverage)

## Creating amplicons

0 comments on commit d6af063

Please sign in to comment.