-
Notifications
You must be signed in to change notification settings - Fork 5
Home
While this pipeline was written, the performance of the different implemented assembly tools was evaluated on different datasets. This was mainly done with a low- to medium-coverage scenario in mind (8-10x Nanopore, 20-40x Illumina), but will be extended for other scenarios as well.
The choice which assembler to use depends also on the type of genome to be analyzed. Some tools might perform better on microbial genomes while others will perform better on large eukaryotic genomes.
All statistics were calculated using QUAST. The assemblies were created using the tools from this pipeline.
The following assemblies were made during the test-phase of the pipeline, because we wanted to now which assembler would perform best with the following coverages:
- Short-reads: 20-40x
- Long-reads: 5-10x
Because it was planned to assemble the genomes of multiple samples from several species (mammals, human), it was not feasible to create larger coverages. For this test, only Nanopore long-reads were used. Data from different species was downloaded and subsampled to 10x long-reads and 25x short-reads.
Metric | MaSuRCA | SPAdes | Canu |
---|---|---|---|
Genome statistics | |||
Genome fraction (%) | 95.593 | 98.883 | - |
Duplication ratio | 1.01 | 1.001 | - |
Largest alignment | 350,299 | 539,289 | - |
Total aligned length | 4,478,886 | 4,590,963 | - |
NG50 | 125,891 | 236,291 | - |
NA50 | 116,095 | 193,949 | - |
NA50 | 130,492 | 236,291 | - |
Missassemblies | 6 | 2 | - |
Scaffolds | 55 | 39 | - |
Runtime | 2m 37s | 7m 95s | |
Memory | 2.7 GB | 5.0 GB |