Metagenomic Assembly Sheds Light on Microbial Diversity in Compost

Published in Data Releases
April 17, 2026
2 min read
Metagenomic Assembly Sheds Light on Microbial Diversity in Compost

Overview

We are pleased to release a metagenomic dataset from deep sequencing of a mature compost pile, highlighting the capabilities of Oxford Nanopore Technologies in characterising highly diverse microbial communities. From 1.45 Tb of sequencing, we performed de novo assembly with metaMDBG and binned 5,598 metagenome-assembled genomes (MAGs) of medium or higher MIMAG quality, including 1,353 circularized, single contig MAGs. This release includes: basecalls; assembled contigs; binned genomes; and a poster from ASM Microbe 2025 highlighting a high degree of species-level novelty, strain-dependent antimicrobial resistance (AMR) profiles, and intragenic invertons revealed through long-read sequencing.

Sample

DetailDescription
Sample NameCompost
OrganismMicrobial community
Molecule TypegDNA
Sample TypeCompost
Biological replicates1
Flow Cell replicates8

Preparation

Sample preparation was performed according to protocols published on the main Oxford Nanopore Technologies’ website.

DetailDescription
ExtractionMP Biomedicals FastDNA SPIN Kit for Soil (SKU 116560200-CF)
Library PrepLigation Sequencing Kit V14
KitSQK-LSK114

Further preparation information such as sample storage suggestions can be found on the Oxford Nanopore Website.

Sequencing

Sequence data was generated using the following configuration. Each flow cell was run for 100 hours. Both HAC and SUP basecalls are available, but SUP basecalls were used for assembly and downstream analysis.

DetailDescription
Flow CellFLO-PRO114M
DevicePromethION
ChemistryR10.4.1
Basecall Modelv5.0.0 SUP; v5.0.0 HAC

Data Download

The dataset is available for anonymous download, without login, from a public Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data. The data can be downloaded with the AWS CLI command:

aws s3 sync --no-sign-request s3://ont-open-data/compost_mgx_2026.04 compost_mgx_2026.04

See the tutorials page for information on downloading the dataset.

You can also browse and download the files in your web browser courtesy of 42basepairs.

Folder nameSizeDescription
raw17TBPOD5 files
basecalls1.1 TbBAM files
analysis15 GbWorkflow outputs

Analysis

Analysis outputs are available. The analysis results are located in the S3 bucket under the prefix:

s3://ont-open-data/compost_mgx_2026.04/analysis

Other software used for analysis:

ToolVersion
dorado0.8.3
metaMDBG1.1
checkM21.0.2
minimap22.27
SemiBin22.1.0
MetaBAT22.17
DASTool1.1.7

Below is an outline of the analysis workflow:

  1. Pod5 files were basecalled using Dorado with the v5.0.0 SUP DNA basecalling model (dna_r10.4.1_e8.2_400bps_sup@v5.0.0).
  2. Reads were filtered for quality >10 and length >1 kb, then de novo assembled with metaMDBG with the --in-ont flag and default parameters.
  3. CheckM2 was run on contigs >1 Mb in length to identify single contigs that represent complete genomes, defined as >90% and >=95% complete if circular or linear, respectively. To avoid overbinning, complete single contig genomes were not included in ensemble binning steps. Ensemble binning was performed on remaining contigs >10 kb in length.
  4. A BAM file for steps 5-6 was generated by mapping reads back to the ensemble binning contigs using minimap2 with the -x lr:hq and --secondary=no flags.
  5. SemiBin2 was run on the ensemble binning contigs with --self-supervised, --sequencing-type=long_reads, and --environment=human_gut flags.
  6. MetaBat2 was run on the ensemble binning contigs with --minContig 30000.
  7. DAS Tool was run on the total bin output from SemiBin2 and MetaBat2. The final set of genomes was generated by merging the DAS Tool final bins with the complete single contig bins identified in step 3.
  8. CheckM2 was run on the final set of genomes to estimate completeness and contamination.

See the results of our MAG binning approach towards another complex community, the ZymoBIOMICS Fecal Reference, in our Metagenomics Application Note. The corresponding Zymo Fecal dataset is also openly available here.


Tags

#datasets

Share

Table Of Contents

1
Overview
2
Sample
3
Preparation
4
Sequencing
5
Data Download
6
Analysis
7
Related Materials

Related Posts

Hereditary Cancer Panel Positive Control
November 07, 2025
2 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2020 - 2026 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.