Oxford Nanopore Technologies’ rapid barcoding kit (RBK114) may be used to prepare multiplexed sequencing libraries from laboratory cloning plasmids. The EPI2ME bioinformatics workflow, wf-clone-validation, is a widely used tool that can use the RBK114-derived sequencing data (sequenced on MinION, GridION or even PromethION devices) to assemble complete plasmid genome sequences. The assembled plasmids can be used to verify that the correct insert has been cloned and can provide additional information on the integrity of the plasmid backbone.
The wf-clone-validation workflow is provided with minimal FASTQ format sequence to demonstrate a successful bioinformatics analysis; there have been requests for more workflow illustrative examples.
The dataset provided in this release contains sequencing data from 96 different plasmids that have been designed to address questions frequently raised when discussing plasmid sequencing and the associated bioinformatics.
The POD5 signal data is provided along with both HAC and SUP basecalls. The reference information for the plasmids and their inserts is also provided.
This data collection thus represents a variety of different plasmids that are of varying degrees
of difficulty to assemble. Did you know that the Canu
assembly method provided in wf-clone-validation
is better at assembling shorter plasmids than the default Flye method? While Flye is the default assembler in wf-clone-validation, due to its continued support and performance, in most use cases users working with small plasmids (typically <= 3kb) may find the alternative Canu assembler more successful.
The dataset also contains laboratory artefacts - can you find sequence reads in any samples that do not appear to belong?
The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:
aws s3 sync --no-sign-request s3://ont-open-data/plasmid_2025.04 plasmid_2025.04
See the tutorials page for information on downloading the dataset. You can also browse and download the files in your web browser courtesy of 42basepairs.
Folder name | Size | Description |
---|---|---|
RAW | 254 GB | POD5/Flowcell files |
Basecalls | 90 GB | BAM files |
Analysis | 1.6 GB | Workflow outputs |
Attribute | Value |
---|---|
Sample Name | Sample01-96 |
Organism | synthetic construct |
Molecule Type | DNA |
Sample Type | glycerol stock |
Biological replicates | 2 |
Flow Cell replicates | 2 |
Link to sample source | Not publicly available |
Sample preparation was performed according to protocols published on the main Oxford Nanopore Technologies’ website.
Attribute | Value |
---|---|
Extraction | Plasmid Extraction |
Library Prep | SQK-RBK114 |
Kit | SQK-RBK114 |
Further preparation information such as sample storage suggestions can be found at https://nanoporetech.com/documentation/prepare.
Sequence data were generated using the following configuration:
Attribute | Value |
---|---|
Flow Cell | FLO-MIN114 |
Device | GridION |
Chemistry | R10.4.1 |
Basecall model | dna_r10.4.1_e8.2_400bps_sup@v5.0.0 , dna_r10.4.1_e8.2_400bps_hac@v5.0.0 |
MinKNOW version | 6.2.6 |
Dorado version (for rebasecalling) | 0.9.1 |
Analysis outputs are available. The analysis results are located in our S3 bucket and can be downloaded with the following command:
aws s3 sync --no-sign-request s3://ont-open-data/plasmid_2025.04/analysis analysis
EPI2ME workflows used to generate analysis outputs:
Other software used for analysis:
Tool | Version |
---|---|
dorado | 0.9.1 |
Related Links