Chromatin accessibility data and tool release

Published in Data Releases
May 21, 2025
4 min read
Chromatin accessibility data and tool release

Functional epigenomics uses high-throughput sequencing data to study how epigenetic modifications may affect gene expression and function across the entire genome. In this post we describe an assay that enables researchers to simultaneously interrogate endogenous DNA methylation state and chromatin accessibility in addition to the primary sequence variants normally available from whole genome Nanopore sequencing data.

Nanopore sequencing is capable of detecting chemical base modifications, as discussed in previous blog posts for DNA and RNA. Typically, these base modifications are endogenous to the sample and the result of a biological process. In this assay, however, we leverage the ability of Nanopore sequencing to detect exogenous base modifications, where these modifications would not occur naturally in the organism of interest. To recognize regions of accessible chromatin, we’ll introduce 6-methyladenine (6mA) residues via a methyltransferase enzyme (MTase). We do this by treating chromatinized DNA with a MTase that is not sequence specific, such as EcoGII, which will recognize any Adenine residue as a potential reaction site for deposition of a methyl group resulting in a 6mA base. Critically, Adenine bases that are occluded by a protein, such as a nucleosome, transcription factor, or other DNA-binding protein, will be blocked from reaction by the methyltransferase and remain unmodified. When we extract the DNA and sequence the resultant library, we can infer if a region was accessible to the MTase by enrichment of 6mA on the reads aligned to that region. Since no amplification or conversion is necessary, endogenous methylation such as CpG-context 5-methylcytosine (5mC) is retained on the DNA strands and can be detected as well. Finally, these libraries can be used for typical downstream analysis such as assembly, long and short variant calling, and differential methylation analysis, all with the added layer of chromatin accessibility information as well.

Data Access

Raw nanopore data in POD5 format as well as basecalled modBAMs with 5-hydroxymethylcytosine (5hmC), 5mC, and 6mA base modification calls are available for download from a public Amazon Web Services S3 bucket.

The data is located in the bucket at:

s3://ont-open-data/chrom_acc_2025.06

These datasets can be used as a reference to compare your experiments to as well as general exploration and testing.

See the tutorials page for information on downloading the dataset.

Sample preparation

We’ve developed a protocol that will be made available on the ONT community website that uses entirely commercially available reagents. Libraries were prepared using the Ligation Sequencing DNA V14 (SQK-LSK114) Kit and sequenced on a PromethION instrument to give the outputs below:

NameFlowcellConditionThroughput (Gb)N50 (kbp)
Chromatin Accessibility Replicate 1PBA15156Chromatin accessibility105.7219.4
Chromatin Accessibility Replicate 2PAY22766Chromatin accessibility108.4918.2
Native SamplePBA15131Native150.5329.2

Basecalling and analysis

These reads were basecalled using the v5.2 high accuracy (HAC) model and associated modified base detection models. Specifically the software used was:

  • Dorado v1.0.0
  • Modkit v0.5.0

Basecalling these reads doesn’t require any special settings, simply using Dorado with the v5.2 models and 6mA models will suffice. Detecting 5hmC/5mC concurrently is optional, but recommended, below is an example command to detect 6mA at all sequence contexts and 5hmC/5mC at CpG dinucleotides.

# dorado version 1.0
$ dorado basecaller hac,5mCG_5hmCG,6mA raw/pod5 --reference ${genome_fasta} > basecalls.bam

Predict accessible regions with Modkit

As of version 0.5.0, Modkit contains a command to predict regions of open chromatin and produce a BedGraph file that can be visualized on commonly used browsers. For additional details see the online Modkit documentation. We plan to continue to add functionality to the Modkit open-chromatin suite of tools.

# predict regions of open chromatin with Modkit
$ modkit open-chromatin predict \
basecalls/PAY82297/calls.sorted.bam \
--region "chr19" \ # optional
-o chr19_open_chromatin.bedgraph \
--device 0 \
--model dist_modkit_v0.5.0_5120ef7_tch/models/r1041_e82_400bps_hac_v5.2.0@v0.1.0

As you may notice by the --device 0 option, this is the first machine learning model in Modkit and it will take advantage of a GPU if you have one available. The GPU resources required for this command are modest and the model can be run on a CPU as well (--device cpu). In the two IGV screen shots below, one of the HLA-C locus and one of the ZNF locus we can see the similarity of the the 6mA enrichment between the two runs. In both images, the top track is the output of modkit open-chromatin predict which predicted probability of the chromatin being accessible.

Browser screenshot showing enrichment of 6mA
Enrichment of 6mA around HLA-C, two runs are broadly very similar.
Browser screenshot showing enrichment of 6mA
Enrichment of 6mA is evident around TSS at the ZNF locus.

Quantifying 6mA enrichment

One of the project aims was to develop a protocol that achieves high “signal over background”, meaning 6mA is enriched in regions of open chromatin and regions of heterochromatin have low 6mA levels. We can get an estimate of this measure by using modkit localize in regions where we expect chromatin to be accessible to the methyltransferase, such as house keeping gene promoters, shown below.

6mA enrichment at HKG promoters
Elevated levels of 6mA are observed in promoter regions.

Relative 6mA enrichment levels are reproducible across preparations

Methods that use enzymatic treatment will almost always come with some amount of variability due to changes in reagents, measurement error, etc. Above we see both runs have similar but not identical peak 6mA levels. At a finer resolution, below we show that there is a strong correspondence between the 6mA levels between samples at ATAC-seq peaks.

6mA levels at ATAC peaks from released data
6mA levels at ATAC peaks show good correlation between replicates

Accurate endogenous methylation calls maintained with chromatin accessibility protocol

One important utility of this method is that endogenous 5mC/5hmC methylation detection performance is preserved, making it possible to observe concurrent changes in regulation due to DNA methylation as well as chromatin accessibility. Below we show the CpG methylation frequency for two matched samples, one with the chromatin accessibility methyltransferase treatment and one prepared using the typical human variation workflow.

5mC CpG correlation between MTAse-treated reads and control native reads
Chromatin accessibility protocol does not change reporting of CpG methylation.

Read-level accuracy is maintained through MTase treatment

Finally, with the v5.2 High-Accuracy basecalling models, there is little to no difference in basecalling accuracy when a sample is subjected to the chromatin accessibility protocol.

Read accuracy is similar in +CA and -CA samples
Identity Q for the released samples, filtered to reads >= 5000kb

Similarly we have not observed a change in variant calling performance with Clair3 on these samples, results summarized below:

NameFlowcellMedian Accuracy (Q)Mean Accuracy (Q)CA-treatmentMean CoverageSNP F1 Score, %, 30XIndel F1 Score, no_HPs, %, 30X
Chromatin Accessibility Replicate 1PBA1515619.9919.82+34.1199.1497.69
Chromatin Accessibility Replicate 2PAY2276620.2520.39+35.099.1597.7
Native SamplePBA1513120.4019.83-48.5699.1597.82

Discussion

Here we’ve released to the community a reference dataset showing the capability of Nanopore sequencing to detect regions of accessible chromatin as well as native methylation and primary sequence variants. We hope that this release will inspire researchers to download and try the protocol and compare their results to the ones presented here.


Tags

#modifiedbases#ont-open-data#multi-omics

Share

Table Of Contents

1
Data Access
2
Sample preparation
3
Basecalling and analysis
4
Discussion

Related Posts

RNA Modified Base Best Practices and Benchmarking
March 06, 2025
5 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2020 - 2025 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.