Percula
Percula is a Python package to provide a shim between spatial single-cell data output from Oxford Nanopore Technologies' sequencing devices and 10X Genomics' Space Ranger.
At the time of writing, Space Ranger does not natively support long-read sequencing data from Nanopore devices. Percula provides a way to convert the output of the MinKNOW device software into a format that can be ingested by Space Ranger, primarily in order to obtain cell and UMI barcodes for long-read sequencing data. This information can then be fed into wf-single-cell for long-read single-cell analysis.
Installation
Percula can be obtained as either a conda or pip package. For conda, it can be installed with:
conda create -n percula -c conda-forge -c bioconda -c nanoporetech percula
conda activate percula
Usage
The primary function of Percula is to convert the output of MinKNOW into a format that can be handled by Space Ranger. Its secondary function (because it takes over from other parts of wf-single-cell), is to perform dechimerisation of reads and read trimming.
Running Percula can be done with the following command:
percula preprocess <output> <inputs> ...
where <output>
is the path where the output files will be written, and
<inputs>
are the input files to be processed. The inputs may either be single BAM files,
or directories. If directories are provided, they will be searched recursively for BAM files.
See the Onward Processing section below for information on how to use the output files with Space Ranger and wf-single-cell.
For additional support running Percula, please contact Oxford Nanopore Support. It may speed your support request by noting the request is for the attention of the Customer Workflows team.
Fastq Inputs
Although Percula primarily works with BAM files, it can also be used with FASTQ files through the use of fastcat. Fastcat is used to aggregate files whilst preserving metadata information from either the MinKNOW device software, or the dorado basecaller (which write metadata in slightly different ways).
Note: do not use
samtools import
to aggregate FASTQ files, as metadata may not be preserved correctly when converting to BAM.
To use Percula with FASTQ files, you can run the following command:
fastcat --bam_out --threads 4 --recurse <inputs> ... \
| percula preprocess <path_to_output_directory> -
where <inputs>
are the input FASTQ files to be processed. Note the -
at the end, it
indicates that Percula should read from standard input stream. As with percula preprocess
,
the <inputs>
argument to fastcat
can be a single FASTQ file, or a directory containing
FASTQ files.
Outputs
Three outputs are generated by percula preprocess
:
- configs.json: A JSON file containing adapter configurations found within reads.
- SAMPLE_S1_L001.bam: A BAM file containing the reads that have been processed.
- SAMPLE_S1_L001_R[1,2]_001.fastq.gz: a pair of pseudo pair-end FASTQ files containing the reads that have been processed. The first file contains the forward reads, and the second file contains the reverse reads.
The first two files are required for downstream processing with wf-single-cell, while the paired-end read files should be provided to Space Ranger for demultiplexing.
Onward Processing
Having processed the data with Percula, the data can be processed with Space Ranger, and subsequently with wf-single-cell.
Space Ranger processing
The short-read FASTQ output files from Percula can be used with Space Ranger as they would be with any other FASTQ files. For example:
spaceranger count \
--id <SAMPLE_ID> --slide=<SLIDE_ID> --area=<AREA> \
--create-bam=true \
--transcriptome=<TRANSCRIPTOME_REFERENCE> \
--cytaimage=<VISIUM IMAGE> \
--fastqs=<PERCULA OUTPUT DIRECTORY>
Please note that the --create-bam=true
option is required here: it will produce a BAM file
containing the sequencing reads, annotated with spatial barcodes and UMI information. This
information is required for downstream processing with wf-single-cell.
The required BAM file will be under the spaceranger ouput directory as:
<SPACE_RANGER_OUTPUT>/outs/possorted_genome_bam.bam
For further help running Space Ranger, please refer to 10X Genomics' documentation.
wf-single-cell processing
The output from Space Ranger can be combined with the output of Percula to run wf-single-cell.
nextflow run wf-single-cell \
--bam <PERCULA_OUT>/SAMPLE_S1_L001.bam \
--spaceranger_bam <SPACE_RANGER_OUTPUT>/outs/possorted_genome_bam.bam \
--adapter_configs <PERCULA_OUT>/configs.json \
--kit visium_hd:v1
The --bam
argument should point to the BAM file produced by Percula, while the
--spaceranger_bam
argument should point to the BAM file produced by Space Ranger. The former
is the same option that would be used with the workflow in its standard use with
other 10X Genomics data. The latter option is particular to the processing of Visium HD
data --- it is used to provide the spatial barcodes and UMI information to the workflow
causing the workflow to skip its usual read preprocessing and demultiplexing steps.
The workflow will still perform full-length isoform specific processing such as long-read
alignment and isoform quantification. The --adapter_configs
argument should refer to a JSON
file produced by Percula; this contains counts of adapter configurations that
wf-single-cell uses in the report generation.
See the wf-single-cell documentation for further information on how to run the workflow, or contact Oxford Nanopore Support.