We have sequenced the CEPH1463 (NA12878/GM12878, Ceph/Utah pedigree) human genome reference standard on the Oxford Nanopore MinION using direct RNA sequencing kits (30 flowcells) and using the 1D ligation kit (SQK-LSK108) on R9.4 flowcells using R9.4 chemistry (FLO-MIN106). RNA from the GM12878 human cell line (Ceph/Utah pedigree) was extracted from the cultured cell line.
We encourage the reuse of this data in your own analysis and publications which is released under the Creative Commons CC-BY license. Therefore we would be grateful if you would cite the reference below if you do.
Rachael E. Workman, Alison D. Tang, Paul S. Tang, Miten Jain, John R. Tyson, Roham Razaghi, Philip C. Zuzarte, Timothy Gilpatrick, Alexander Payne, Joshua Quick, Norah Sadowski, Nadine Holmes, Jaqueline Goes de Jesus, Karen L. Jones, Cameron M. Soulette, Terrance P. Snutch, Nicholas Loman, Benedict Paten, Matthew Loose, Jared T. Simpson, Hugh E. Olsen, Angela N. Brooks, Mark Akeson & Winston Timp. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nature Methods doi: doi:10.1038/s41592-019-0617-2
Full Native RNA dataset (30 runs), full cDNA dataset (12 runs), and IVT RNA dataset. The data were rebasecalled using Guppy 4.2.2 flip flop (hac) models.
FileType | # runs | Link |
---|---|---|
Native RNA | 30 | FASTQ, Summary File (gzip), Multi_FAST5 |
cDNA | 12 | FASTQ, Summary File (gzip), Multi_FAST5 |
IVT RNA | 2 | FASTQ, Summary File (gzip), Multi_FAST5 |
- Nick Loman, Josh Quick, Andrew Beggs, Jaqueline Goes de Jesus (University of Birmingham)
- Matt Loose, Nadine Holmes, Matthew Carlile (University of Nottingham)
- Winston Timp, Roham Razaghi, Timothy Gilpatrick, Norah Sadowski, Rachael E. Workman (JHU)
- Jared Simpson, Phil Zuzarte, Paul Tang (OICR)
- Terry Snutch, John Tyson (UBC)
- Mark Akeson, Angela N. Brooks, Hugh E. Olsen, Benedict Paten, Alison Tang, Miten Jain (UCSC)
Full Native RNA dataset (30 runs) and full cDNA dataset (12 runs).
FileType | # runs | # reads | Mean (b) | Read N50 (b) | Link |
---|---|---|---|---|---|
Native RNA Pass | 30 | 10302647 | 1030.24 | 1334 | FASTQ |
Native RNA Fail | 30 | 2686736 | 430.96 | 840 | FASTQ |
cDNA Pass | 12 | 15152101 | 932.86 | 1072 | FASTQ |
cDNA Fail | 12 | 9129338 | 661.90 | 841 | FASTQ |
FASTQ and FAST5 files for the dataset (split by centre and sample) can be found here. The continous Bulk FAST5 files could be visualized using bulkvis.
All alignments performed using minimap2.
FileType | Reference | Params | BAM | BAI |
---|---|---|---|---|
Native RNA Pass | GRCh38_full_analysis_set_plus_decoy_hla.fa | -ax splice -uf -k14 | hg38 BAM | hg38 BAI |
Native RNA Pass | SIRVome_isoforms_ERCCs_170612a.fasta | -ax splice --splice-flank=no | SIRVome BAM | SIRVome BAI |
Native RNA Fail | GRCh38_full_analysis_set_plus_decoy_hla.fa | -ax splice -uf -k14 | hg38 BAM | hg38 BAI |
Native RNA Fail | SIRVome_isoforms_ERCCs_170612a.fasta | -ax splice --splice-flank=no | SIRVome BAM | SIRVome BAI |
cDNA Pass | GRCh38_full_analysis_set_plus_decoy_hla.fa | -ax splice -uf -k14 | hg38 BAM | hg38 BAI |
cDNA Pass | SIRVome_isoforms_ERCCs_170612a.fasta | -ax splice --splice-flank=no | SIRVome BAM | SIRVome BAI |
cDNA Fail | GRCh38_full_analysis_set_plus_decoy_hla.fa | -ax splice -uf -k14 | hg38 BAM | hg38 BAI |
cDNA Fail | SIRVome_isoforms_ERCCs_170612a.fasta | -ax splice --splice-flank=no | SIRVome BAM | SIRVome BAI |
Various analyses from the consortium work and the associated files can be found here.
Details on the reference files used for analyses, and their download links can be found here
Heng Li has make a custom track for the UCSC genome browser from the direct RNA dataset. Thanks Heng! [1]
[1] Li, H Twitter link
We are most grateful to Daniel Garalde, Daniel Jachimowicz, Andy Heron, Rosemary Dokos at Oxford Nanopore Technologies for technical and logistical assistance. We are grateful to Angel Pizarro and Jed Sundwall at Amazon Web Services for hosting this dataset as an AWS Open Data set.