Phasing of small insertions and deletions

By Sean McKenzie
Published in Articles
May 14, 2025
1 min read
Phasing of small insertions and deletions

One of the major advantages of long-read sequencing is its ability to resolve phasing — identifying whether two variants occur on the same physical copy of a chromosome (i.e. in cis) or on opposite copies (in trans). This information can be critical, especially when investigating compound heterozygous mutations. For example, two loss-of-function variants in cis would affect only one copy of a gene, potentially leaving the other intact. In trans, both copies may be compromised, leading to more significant biological consequences.

Our wf-human-variation workflow uses Whatshap to perform phasing of single nucleotide variants (SNVs). It does this by tracing reads that span multiple heterozygous positions, chaining together variants until the sequence encounters an un-spanned region. The result is a phaseblock — a contiguous section of the genome where the haplotype structure is known. This phasing information is then written back to the reads in a process called haplotagging.

From here, tools like sniffles2 (for structural variant calling) and modkit (for modified base aggregation) can read the haplotagged data and produce phased outputs. This opens up the possibility of identifying compound heterozygous events across different variant types — not just SNVs, but structural variants (SVs) and epigenetic alleles too.

However, until now, the workflow ignored small insertions and deletions (indels) when phasing. This was a deliberate trade-off. Indels are slightly noisier than SNVs, and including them directly in phasing could lead to switch errors — incorrect phasing assignments within a block — which in turn could produce misleading biological interpretations.

With the latest update to wf-human-variation, we’re tackling this limitation head-on. Indels are now phased in the same way as SVs and modified bases: by propagating SNV-based phasing through haplotagged reads. We’ve worked closely with the Whatshap team to implement this logic in a new command, haplotagphase, which is now integrated into the workflow.

The phaseblocks themselves remain largely unchanged (though some may be slightly extended). Indels within those blocks will now carry a phasing tag, just like SNVs and SVs. This improves the resolution of compound heterozygous calls, especially in cases involving frameshift indels that were previously unphased.

This update is available in version 2.7.0 of the wf-human-variation workflow. We’re excited to see how you put it to use — and what new biology it helps uncover.


Tags

#workflows#phasing

Share

Sean McKenzie

Bioinformatician

Related Posts

IGV for EPI2ME workflows
June 10, 2024
1 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2020 - 2025 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.