Here we share a comprehensive transcriptomics dataset generated using Oxford Nanopore Technologies direct RNA and cDNA sequencing chemistries.
This dataset includes sequencing in duplicate of Universal Human Reference RNA (UHRR), including Lexogen SIRV Set-4 RNA spike-ins, and HG002-derived human RNA. This dataset demonstrates that Nanopore long reads support accurate transcript quantification at isoform resolution, while the SIRV spike-ins provide ground-truth transcripts for benchmarking of transcript detection and quantification.
For each sample, we provide both cDNA and dRNA outputs to enable cross-method comparison on matched material. Together, the paired datasets provide complementary value: cDNA supports high-throughput transcript characterization, while direct RNA enables detection of RNA modifications.
The dataset is accompanied by analysis outputs from the updated EPI2ME wf-transcriptomes v2.0.0 workflow, designed to make transcriptome analysis faster and more accessible. The new workflow introduces a quantification-only analysis option and refreshed tooling for extracting transcript-level insights from cDNA and direct RNA sequencing data.
The dataset contains two RNA sources designed for comparative transcriptomics and benchmarking.
| Sample | Sample Type | Organism | Molecule Types | Flow Cell Replicates | Total Flow Cells |
|---|---|---|---|---|---|
| HG002 | Extracted RNA | Human | cDNA, dRNA | 2 | 8 |
| UHRR (+ Lexogen SIRV Set-4) | Extracted RNA | Human, synthetic controls | cDNA, dRNA | 2 | 8 |
Sample preparation is grouped by sample and molecule type to keep the information concise.
| Sample | Molecule | Kits | RNA Preparation | Poly(A) Enrichment | Library Prep Protocol |
|---|---|---|---|---|---|
| HG002 | cDNA | PCS114, PCB114-24 | RNA extraction from human cells | NEBNext High Input Poly(A) mRNA Isolation Module | SQK-PCS114, SQK-PCB114-24 |
| HG002 | dRNA | RNA004, DRB004-24 | RNA extraction from human cells | NEBNext High Input Poly(A) mRNA Isolation Module | SQK-RNA004, SQK-DRB004-24 |
| UHRR | cDNA | PCS114, PCB114-24 | UHRR preparation protocol | NEBNext High Input Poly(A) mRNA Isolation Module | SQK-PCS114, SQK-PCB114-24 |
| UHRR | dRNA | RNA004, DRB004-24 | UHRR preparation protocol | NEBNext High Input Poly(A) mRNA Isolation Module | SQK-RNA004, SQK-DRB004-24 |
Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore Website.
Sequence data was generated using the following configurations:
| Molecule | Flow Cell | Device | Chemistry | MinKNOW Version |
|---|---|---|---|---|
| cDNA | FLO-PRO114M | PromethION | R10.4.1 | 25.11.2 |
| dRNA | FLO-PRO004RA | PromethION | RNA | 25.11.2 |
All flow cells were re-basecalled post-run using standalone Dorado. dRNA flow cells were basecalled twice, once with HAC and once with SUP, to provide both standard and extended modified-base outputs, resulting in 24 basecall sets in total. Additional modified-base calls for dRNA are only available with SUP basecalling.
| Molecule | Basecall Model | Dorado | Modified Bases |
|---|---|---|---|
| cDNA | HAC v6.0.0 | v2.0.0 | N/A |
| dRNA | HAC v6.0.0 | v2.0.0 | m5C, inosine_m6A, pseU |
| dRNA | SUP v6.0.0 | v2.0.0 | m5C_2OmeC, inosine_m6A_2OmeA, pseU_2OmeU, 2OmeG |
The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:
aws s3 sync --no-sign-request s3://ont-open-data/UHRR_HG002_2026.06 UHRR_HG002_2026.06
See the tutorials page for information on downloading the dataset.
You can also browse and download the files in your web browser courtesy of 42basepairs.
| Folder name | Size | Description |
|---|---|---|
| raw | 16 Tb | POD5 files |
| basecalls | 1.7 Tb | BAM files |
| analysis | 436 GB | Workflow outputs |
wf-transcriptomes was run on all 24 basecalls. It performs reference-guided long-read transcriptome analysis by running a splice aware alignment if necessary, building and quantifying transcript models with bambu, and classifying isoforms with SQANTI3; ; as modified-base tags are present (for direct RNA sequencing), the workflow also runs for per-sample modification summaries.
In the workflow HTML reports, users can find run and alignment QC summaries, transcript discovery and quantification summaries (gene and transcript counts), SQANTI3-based isoform classification and quality metrics and mod base report.
The analysis results are located in the S3 bucket under the prefix:
s3://ont-open-data/UHRR_HG002_2026.06/analysis
Related Links
