We are pleased to announce a fresh release of the CliveOME using the latest Q20 pre-release chemistry.
As with previous releases the new dataset is available for anonymous download from and Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data.
The data is located in the bucket at:
s3://ont-open-data/Q20_ULK_Cliveome/
See the tutorials page for information on downloading the dataset.
The dataset comprises the direct output of the sequencing device software MinKNOW, along with basecalls computed post-run using the research-grade bonito basecaller with the “Q20 early access model” as follows:
pip install ont-bonito==0.4.0bonito download --modelsbonito basecaller dna_r10.3_q20ea <read directory> | bgzip -c > basecalls.fa.gz
Only reads passing the default quality filter (average Q-score > 10) were processed by
bonito
, i.e. only those .fast5
files located within the fast5_pass
MinKNOW
output folder.
The sequencing runs here represent data from pre-release versions of the sequencing and analysis components. Data throughput and quality do not reflect that of a released product.
The dataset comprises eight PromethION sequencing runs from our R&D lab using pre-release chemistry components and R10.3 flowcells. A separately prepared sample was run on each flowcells. The flowcells yielded between 10Gbases and 18Gbases with N50 read lengths between 60-95kb.
Basecalling accuracy was assessed by aligning the reads to the GRCh38 human reference using minimap2
,
and alignment statistics calculated using the stats_from_bam
program from the pomoxis
software package.
Related Links