We are excited to release a new addition to our open data initiative: a comprehensive dataset showcasing the end-to-end workflow for sequencing PGx targets using adaptive sampling. Prepared using the Native Barcoding Expansion 114 (NBD114) library preparation kit and sequenced on a PromethION™ device with R10.4.1 chemistry, this dataset includes sequencing data from eight human DNA samples. Analysed with the EPI2ME™ wf-pgx workflow, the dataset features raw reads in POD5 format, basecalled data in BAM files, and downstream analysis outputs tailored for pharmacogenomic studies. Available as part of our open data initiative, this data offers high-quality, freely accessible sequencing data to empower the global research community in personalised medicine.

The following cell line samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: HG00276, HG01190, NA11832, NA19207, NA19226, NA18518, NA19174 and NA07348.

Sample

Detail	Description
Sample Name	HG00276, HG01190, NA11832, NA19207, NA19226, NA18518, NA19174, NA07348
Organism	Human
Molecule Type	DNA
Sample Type	Cell culture
Biological replicates	1
Flow Cell replicates	1
Sample Provider	NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research

Preparation

Libraries were generated with NBD114.24, Oxford Nanopore Technologies native barcoding kit, following the Qiagen Puregene Cell extraction protocol. Adaptive sampling was configured in MinKNOW™ (v25.03.7) to enrich for 375 pharmacogenes (approximately 26.5 Mbp in total). Here the enrichment occurs in real time: off‑target molecules are unblocked within fractions of a second, freeing pores for on‑target fragments. This eliminates probe design, shortens wet‑lab turnaround, and allows flexible target updates.

Detail	Description
Extraction	Qiagen Puregene Cell extraction
Library Prep	NBD114.24
Kit	NBD114.24

Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore website.

Sequencing

Sequence data was generated using the following configuration:

Detail	Description
Flow Cell	FLO-PRO114M
Device	PromethION
Chemistry	R10.4.1
Basecall Model	v5.0.0 HAC
MinKNOW Version	25.03.7

Data Download

The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:

aws s3 sync --no-sign-request s3://ont-open-data/pgx_as_2025.07  pgx_as_2025.07

See the tutorials page for information on downloading the dataset. You can also browse and download the files in your web browser courtesy of 42basepairs.

Folder name	Size	Description
RAW	1852 GB	POD5 files
Basecalls	31 GB	BAM files
Analysis	30 GB	Workflow outputs

Analysis

Demultiplexing and barcode trimming were performed with Dorado™ v1.0.2. The resulting BAM files were then processed with wf‑pgx v0.1.7 to perform alignment, variant calling, and star-allele calling.

The resulting analysis including VCFs, haplotagged BAM files, coverage statistics, and HTML reports are available in the S3 bucket at the following prefix:

s3://ont-open-data/pgx_as_2025.07/analysis

The EPI2ME workflow used to generate these analysis outputs is available upon request.

Further information

If you have questions, feedback or requests for early access to future adaptive sampling workflows, please contact support@nanoporetech.com or directly register your interest here. We hope this dataset accelerates evaluation of adaptive sampling approaches for pharmacogenomics and inspires new real‑time enrichment applications.