This dataset includes sequencing in duplicate of Universal Human Reference RNA (UHRR), including Lexogen SIRV Set-4 RNA spike-ins, and HG002-derived human RNA. This dataset demonstrates that Nanopore long reads support accurate transcript quantification at isoform resolution, while the SIRV spike-ins provide ground-truth transcripts for benchmarking of transcript detection and quantification.

For each sample, we provide both cDNA and dRNA outputs to enable cross-method comparison on matched material. Together, the paired datasets provide complementary value: cDNA supports high-throughput transcript characterization, while direct RNA enables detection of RNA modifications.

The dataset is accompanied by analysis outputs from the updated EPI2ME wf-transcriptomes v2.0.0 workflow, designed to make transcriptome analysis faster and more accessible. The new workflow introduces a quantification-only analysis option and refreshed tooling for extracting transcript-level insights from cDNA and direct RNA sequencing data.

Sample

The dataset contains two RNA sources designed for comparative transcriptomics and benchmarking.

Sample	Sample Type	Organism	Molecule Types	Flow Cell Replicates	Total Flow Cells
HG002	Extracted RNA	Human	cDNA, dRNA	2	8
UHRR (+ Lexogen SIRV Set-4)	Extracted RNA	Human, synthetic controls	cDNA, dRNA	2	8

Preparation

Sample preparation is grouped by sample and molecule type to keep the information concise.

Sample	Molecule	Kits	RNA Preparation	Poly(A) Enrichment	Library Prep Protocol
HG002	cDNA	PCS114, PCB114-24	RNA extraction from human cells	NEBNext High Input Poly(A) mRNA Isolation Module	SQK-PCS114, SQK-PCB114-24
HG002	dRNA	RNA004, DRB004-24	RNA extraction from human cells	NEBNext High Input Poly(A) mRNA Isolation Module	SQK-RNA004, SQK-DRB004-24
UHRR	cDNA	PCS114, PCB114-24	UHRR preparation protocol	NEBNext High Input Poly(A) mRNA Isolation Module	SQK-PCS114, SQK-PCB114-24
UHRR	dRNA	RNA004, DRB004-24	UHRR preparation protocol	NEBNext High Input Poly(A) mRNA Isolation Module	SQK-RNA004, SQK-DRB004-24

Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore Website.

Sequencing

Sequence data was generated using the following configurations:

Molecule	Flow Cell	Device	Chemistry	MinKNOW Version
cDNA	FLO-PRO114M	PromethION	R10.4.1	25.11.2
dRNA	FLO-PRO004RA	PromethION	RNA	25.11.2

Basecalling

All flow cells were re-basecalled post-run using standalone Dorado. dRNA flow cells were basecalled twice, once with HAC and once with SUP, to provide both standard and extended modified-base outputs, resulting in 24 basecall sets in total. Additional modified-base calls for dRNA are only available with SUP basecalling.

Molecule	Basecall Model	Dorado	Modified Bases
cDNA	HAC v6.0.0	v2.0.0	N/A
dRNA	HAC v6.0.0	v2.0.0	m5C, inosine_m6A, pseU
dRNA	SUP v6.0.0	v2.0.0	m5C_2OmeC, inosine_m6A_2OmeA, pseU_2OmeU, 2OmeG

Data Download

The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:

aws s3 sync --no-sign-request s3://ont-open-data/UHRR_HG002_2026.06 UHRR_HG002_2026.06

See the tutorials page for information on downloading the dataset.

You can also browse and download the files in your web browser courtesy of 42basepairs.

Folder name	Size	Description
raw	16 Tb	POD5 files
basecalls	1.7 Tb	BAM files
analysis	436 GB	Workflow outputs

Analysis

wf-transcriptomes was run on all 24 basecalls. It performs reference-guided long-read transcriptome analysis by running a splice aware alignment if necessary, building and quantifying transcript models with bambu, and classifying isoforms with SQANTI3; ; as modified-base tags are present (for direct RNA sequencing), the workflow also runs for per-sample modification summaries.

In the workflow HTML reports, users can find run and alignment QC summaries, transcript discovery and quantification summaries (gene and transcript counts), SQANTI3-based isoform classification and quality metrics and mod base report.

The analysis results are located in the S3 bucket under the prefix:

s3://ont-open-data/UHRR_HG002_2026.06/analysis

Application Note: Long-read cDNA sequencing for isoform-resolution transcriptome analysis - Technical overview of cDNA library preparation and transcript isoform discovery using long-read sequencing.
Direct RNA Sequencing Kits Flyer - Overview of direct RNA sequencing chemistry and kit options for RNA analysis.
EPI2ME 26.06-01 Release Notes - Release notes for the EPI2ME update accompanying this dataset.