We are excited to release a new addition to our open data initiative: a comprehensive dataset showcasing the end-to-end workflow for sequencing PGx targets using adaptive sampling. Prepared using the Native Barcoding Expansion 114 (NBD114) library preparation kit and sequenced on a PromethION™ device with R10.4.1 chemistry, this dataset includes sequencing data from eight human DNA samples. Analysed with the EPI2ME™ wf-pgx workflow, the dataset features raw reads in POD5 format, basecalled data in BAM files, and downstream analysis outputs tailored for pharmacogenomic studies. Available as part of our open data initiative, this data offers high-quality, freely accessible sequencing data to empower the global research community in personalised medicine.
The following cell line samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: HG00276, HG01190, NA11832, NA19207, NA19226, NA18518, NA19174 and NA07348.
Detail | Description |
---|---|
Sample Name | HG00276, HG01190, NA11832, NA19207, NA19226, NA18518, NA19174, NA07348 |
Organism | Human |
Molecule Type | DNA |
Sample Type | Cell culture |
Biological replicates | 1 |
Flow Cell replicates | 1 |
Sample Provider | NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research |
Libraries were generated with NBD114.24, Oxford Nanopore Technologies native barcoding kit, following the Qiagen Puregene Cell extraction protocol. Adaptive sampling was configured in MinKNOW™ (v25.03.7) to enrich for 375 pharmacogenes (approximately 26.5 Mbp in total). Here the enrichment occurs in real time: off‑target molecules are unblocked within fractions of a second, freeing pores for on‑target fragments. This eliminates probe design, shortens wet‑lab turnaround, and allows flexible target updates.
Detail | Description |
---|---|
Extraction | Qiagen Puregene Cell extraction |
Library Prep | NBD114.24 |
Kit | NBD114.24 |
Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore website.
Sequence data was generated using the following configuration:
Detail | Description |
---|---|
Flow Cell | FLO-PRO114M |
Device | PromethION |
Chemistry | R10.4.1 |
Basecall Model | v5.0.0 HAC |
MinKNOW Version | 25.03.7 |
The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:
aws s3 sync --no-sign-request s3://ont-open-data/pgx_as_2025.07 pgx_as_2025.07
See the tutorials page for information on downloading the dataset. You can also browse and download the files in your web browser courtesy of 42basepairs.
Folder name | Size | Description |
---|---|---|
RAW | 1852 GB | POD5 files |
Basecalls | 31 GB | BAM files |
Analysis | 30 GB | Workflow outputs |
Demultiplexing and barcode trimming were performed with Dorado™ v1.0.2. The resulting BAM files were then processed with wf‑pgx v0.1.7 to perform alignment, variant calling, and star-allele calling.
The resulting analysis including VCFs, haplotagged BAM files, coverage statistics, and HTML reports are available in the S3 bucket at the following prefix:
s3://ont-open-data/pgx_as_2025.07/analysis
The EPI2ME workflow used to generate these analysis outputs is available upon request.
If you have questions, feedback or requests for early access to future adaptive sampling workflows, please contact support@nanoporetech.com or directly register your interest here. We hope this dataset accelerates evaluation of adaptive sampling approaches for pharmacogenomics and inspires new real‑time enrichment applications.
Related Links