A comprehensive matched cDNA and direct RNA dataset of UHRR and HG002

Published in Data Releases
June 29, 2026
2 min read
A comprehensive matched cDNA and direct RNA dataset of UHRR and HG002

Here we share a comprehensive transcriptomics dataset generated using Oxford Nanopore Technologies direct RNA and cDNA sequencing chemistries.

This dataset includes sequencing in duplicate of Universal Human Reference RNA (UHRR), including Lexogen SIRV Set-4 RNA spike-ins, and HG002-derived human RNA. This dataset demonstrates that Nanopore long reads support accurate transcript quantification at isoform resolution, while the SIRV spike-ins provide ground-truth transcripts for benchmarking of transcript detection and quantification.

For each sample, we provide both cDNA and dRNA outputs to enable cross-method comparison on matched material. Together, the paired datasets provide complementary value: cDNA supports high-throughput transcript characterization, while direct RNA enables detection of RNA modifications.

The dataset is accompanied by analysis outputs from the updated EPI2ME wf-transcriptomes v2.0.0 workflow, designed to make transcriptome analysis faster and more accessible. The new workflow introduces a quantification-only analysis option and refreshed tooling for extracting transcript-level insights from cDNA and direct RNA sequencing data.

Sample

The dataset contains two RNA sources designed for comparative transcriptomics and benchmarking.

SampleSample TypeOrganismMolecule TypesFlow Cell ReplicatesTotal Flow Cells
HG002Extracted RNAHumancDNA, dRNA28
UHRR (+ Lexogen SIRV Set-4)Extracted RNAHuman, synthetic controlscDNA, dRNA28

Preparation

Sample preparation is grouped by sample and molecule type to keep the information concise.

SampleMoleculeKitsRNA PreparationPoly(A) EnrichmentLibrary Prep Protocol
HG002cDNAPCS114, PCB114-24RNA extraction from human cellsNEBNext High Input Poly(A) mRNA Isolation ModuleSQK-PCS114, SQK-PCB114-24
HG002dRNARNA004, DRB004-24RNA extraction from human cellsNEBNext High Input Poly(A) mRNA Isolation ModuleSQK-RNA004, SQK-DRB004-24
UHRRcDNAPCS114, PCB114-24UHRR preparation protocolNEBNext High Input Poly(A) mRNA Isolation ModuleSQK-PCS114, SQK-PCB114-24
UHRRdRNARNA004, DRB004-24UHRR preparation protocolNEBNext High Input Poly(A) mRNA Isolation ModuleSQK-RNA004, SQK-DRB004-24

Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore Website.

Sequencing

Sequence data was generated using the following configurations:

MoleculeFlow CellDeviceChemistryMinKNOW Version
cDNAFLO-PRO114MPromethIONR10.4.125.11.2
dRNAFLO-PRO004RAPromethIONRNA25.11.2

Basecalling

All flow cells were re-basecalled post-run using standalone Dorado. dRNA flow cells were basecalled twice, once with HAC and once with SUP, to provide both standard and extended modified-base outputs, resulting in 24 basecall sets in total. Additional modified-base calls for dRNA are only available with SUP basecalling.

MoleculeBasecall ModelDoradoModified Bases
cDNAHAC v6.0.0v2.0.0N/A
dRNAHAC v6.0.0v2.0.0m5C, inosine_m6A, pseU
dRNASUP v6.0.0v2.0.0m5C_2OmeC, inosine_m6A_2OmeA, pseU_2OmeU, 2OmeG

Data Download

The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:

aws s3 sync --no-sign-request s3://ont-open-data/UHRR_HG002_2026.06 UHRR_HG002_2026.06

See the tutorials page for information on downloading the dataset.

You can also browse and download the files in your web browser courtesy of 42basepairs.

Folder nameSizeDescription
raw16 TbPOD5 files
basecalls1.7 TbBAM files
analysis436 GBWorkflow outputs

Analysis

wf-transcriptomes was run on all 24 basecalls. It performs reference-guided long-read transcriptome analysis by running a splice aware alignment if necessary, building and quantifying transcript models with bambu, and classifying isoforms with SQANTI3; ; as modified-base tags are present (for direct RNA sequencing), the workflow also runs  for per-sample modification summaries.

In the workflow HTML reports, users can find run and alignment QC summaries, transcript discovery and quantification summaries (gene and transcript counts), SQANTI3-based isoform classification and quality metrics and mod base report.

The analysis results are located in the S3 bucket under the prefix:

s3://ont-open-data/UHRR_HG002_2026.06/analysis
  • Application Note: Long-read cDNA sequencing for isoform-resolution transcriptome analysis - Technical overview of cDNA library preparation and transcript isoform discovery using long-read sequencing.
  • Direct RNA Sequencing Kits Flyer - Overview of direct RNA sequencing chemistry and kit options for RNA analysis.
  • EPI2ME 26.06-01 Release Notes - Release notes for the EPI2ME update accompanying this dataset.

Tags

#datasets

Share

Table Of Contents

1
Sample
2
Preparation
3
Sequencing
4
Basecalling
5
Data Download
6
Analysis
7
Related Materials

Related Posts

NO-MISS Bacterial isolate dataset
June 04, 2026
2 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2008 - 2026 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, AmPORE-TB, EPI2ME, GridION, MinION, MinKNOW, PromethION, P2 Solo, and P2 are registered trademarks or the subject of trademark applications of Oxford Nanopore Technologies plc in various countries. Information contained herein may be protected by copyright, patents or patents pending of Oxford Nanopore Technologies plc. All other brands and names contained are the property of their respective owners. Oxford Nanopore Technologies products are RUO. Products labelled/branded as Oxford Nanopore Diagnostics may be RUO or may be regulated as in‐vitro diagnostic devices in some jurisdictions, please check individual product labelling.