Structural variant calling in wf-human-variation 2.7.0

By Philipp Rescheneder
Published in Articles
May 14, 2025
3 min read
Structural variant calling in wf-human-variation 2.7.0

Structural variants (SVs) represent some of the most impactful — and challenging — classes of genetic variation to detect. Defined as genomic rearrangements larger than 50 base pairs, SVs include deletions, insertions, tandem duplications, inversions, mobile element insertions, and translocations. Despite being fewer in number than single nucleotide variants (SNVs), they affect more total bases per generation and have a disproportionate influence on gene function, regulatory structure, and phenotype.

In clinical contexts, SVs are often behind complex or undiagnosed genetic conditions. In cancer, they are drivers of genome instability and tumour evolution. Accurate and comprehensive SV detection is therefore critical across applications from rare disease diagnostics to population-scale genomics.

Oxford Nanopore’s long reads are a natural fit for SV analysis, offering the span and resolution needed to capture complex rearrangements. In our wf-human-variation workflow, we rely on the Sniffles SV caller for its speed, accuracy, and phasing support. With version 2.7.0 of the workflow, we’re updating to Sniffles v2.6.2 — bringing improved recall for large variants and greater compatibility with downstream tools.

Update to Sniffles v2.6.2

We use Sniffles in wf-human-variation for structural variant calling due to its combination of high performance, fast runtime, and comprehensive feature set. The workflow can also phase SVs based on SNV phasing information.

Updates to our analysis components are rigorously benchmarked. In updating to version 2.6.2 of Sniffles (from a modified version of v2.0.7), the main improvements are:

  • increased precision and recall for deletions and duplications larger than 25 kb,
  • improved sampling of read depth across large SVs, enabling more effective filtering of large false positives, and
  • enhanced VCF standard compliance.

Benchmarking

Structural variation comprises a large set of different types of variants with variant sizes ranging from fifty to hundreds of thousands of base pairs. While smaller SVs are found in high numbers (~20,000) even in healthy individuals, larger SVs (with increased pathogenicity likelihood) are rarer. This makes benchmarking difficult without access to a broad cohort of clinical samples. Our benchmarking therefore uses a combination of real and synthetically constructed data. We extensively sequence and scrutinize the Genome in a Bottle HG002 benchmarking set. Our synthetic benchmarks are constructed after consideration and manual investigation of potentially problematic regions, observed in our own analysis and that of others.

GIAB HG002 (hg38) benchmark

Our primary benchmarking is performed using the 2024-11-13 GIAB HG002 truthset. This truthset is based on the HG002 T2T Q100 assembly (v1.1) consisting of approximately 22,000 high confidence structural variants. We use Truvari version 4.3.1 for the benchmarking. The main truvari bench command is run with the options --passonly --pick ac --dup-to-ins. Additionally we run truvari refine --recount --use-region-coords --use-original-vcfs --align mafft in order that variant representations are normalized consistently during the comparison. When performing benchmarking Sniffles is run using the --phase option as this allows truvari refine to more accurately normalize variants.

The results of the GIAB HG002 benchmark are shown in the table below.


Sample

Depth
v2.0.7
F1

Precision

Recall
v2.6.2
F1

Precision

Recall
Mean400.97510.98650.96400.97490.98590.9640
Rep10.97500.98660.96380.97410.98540.9631
Rep20.97620.98720.96550.97550.98650.9648
Rep30.97420.98580.96280.97490.98580.9643
Mean300.97510.98640.96410.97480.98580.9640
Rep10.97510.98580.96460.97400.98440.9637
Rep20.97630.98690.96600.97590.98700.9651
Rep30.97390.98640.96180.97450.98610.9633
Mean200.97210.98650.95810.97140.98560.9576
Rep10.97130.98560.95740.96970.98420.9557
Rep20.97180.98680.95740.97170.98610.9576
Rep30.97310.98730.95940.97280.98630.9596

Simulated data

Our simulated data comprises copy number variation (CNV) events, formed using real trio-phased HG002 reads. The initial read dataset include read with an N50 of approximately 15 kb, and 30-fold coverage of the genome. The reads are edited to introduce homozygous and heterozygous, deletion and duplication events. Events covering sizes from 25 kb to 250 kb are included. Structural variants with sizes smaller than 25 kb are sufficiently covered by the GIAB benchmark, while copy number changes greater than 250 kb are covered by a dedicated CNV caller (Spectre) and thus benchmarked separately.

The results of the simulated data benchmark are shown in the table below. When calculating the recall we require that the variant be genotyped correctly.

Simulated CNV Size [kb]NRecall v2.0.7Recall v2.6.2
251760.5570.903
501740.4540.966
751760.3980.955
1001740.4370.948
2501700.4350.982

Analysis

These results show Sniffles v2.6.2 significantly improves recall for large (>25kb) deletions and duplications, in the synthetic dataset. For smaller SVs, those covered by the standard GIAB HG002 benchmark, we observed a slight reduction in F1 scores (0.9748 vs. 0.9751 at 30X) compared to v2.0.7. This reduction is driven by a small decrease in precision. Despite the small regression in shown in the GIAB benchmarks, we believe the improvements in the larger SVs warrants the update to v2.6.2.

Conclusion

The update to Sniffles v2.6.2 in wf-human-variation brings meaningful improvements where they matter most: in the reliable detection of large SVs — a category often underrepresented in benchmarks yet crucial for clinical and cancer applications. Our evaluation shows a marked improvement in recall for variants 25–250 kb in size, with no increase in false positives and no evidence of systematic regression for smaller variants.

Though F1 scores for smaller SVs dip marginally, these changes reflect a rebalancing that favours more accurate recovery of biologically important events. Combined with improved VCF compliance, this version sets the stage for cleaner interoperability with downstream tools and more confident variant interpretation.

We’re excited to roll out Sniffles v2.6.2 in the latest workflow release and look forward to seeing how it supports your research and discovery.


Tags

#human#variants#benchmarking

Share

Philipp Rescheneder

Bioinformatician

Table Of Contents

1
Update to Sniffles v2.6.2
2
Benchmarking
3
Conclusion

Related Posts

The variant delusion: calling variants in a mixed-up world
May 05, 2025
13 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2020 - 2025 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.