Workflow Single Cell report

Summary

Experiment summary

Input reads	16,004
Estimated cells	1,962
Reads per cell (mean)	8
UMIs per cell (median)	4
Genes per cell (median)	4

Barcode rank plot

Alignment / feature summary

Pass reads	15,397
Mapped	15,041
Unmapped	356
Supplementary	269
Unique genes	221
Unique isoforms	200

Input reads: The total number of reads in the input data.
Estimated cells: The estimated number of real cells identified by the workflow.
Mean reads per cell: The average number of reads per real cell.
Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
Median genes per cell: The median number of unique genes identified per real cell.

Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.

Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
Mapped: The number of primary alignments.
Unmapped: The number of reads that were not mapped to the reference genome.
Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads	4,870
Estimated cells	1,263
Reads per cell (mean)	4
UMIs per cell (median)	1
Genes per cell (median)	1

Barcode rank plot

Alignment / feature summary

Pass reads	4,520
Mapped	1,314
Unmapped	3,206
Supplementary	25
Unique genes	67
Unique isoforms	58

Input reads: The total number of reads in the input data.
Estimated cells: The estimated number of real cells identified by the workflow.
Mean reads per cell: The average number of reads per real cell.
Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
Median genes per cell: The median number of unique genes identified per real cell.

Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
Mapped: The number of primary alignments.
Unmapped: The number of reads that were not mapped to the reference genome.
Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads	4,825
Estimated cells	690
Reads per cell (mean)	7
UMIs per cell (median)	1
Genes per cell (median)	1

Barcode rank plot

Alignment / feature summary

Pass reads	3,457
Mapped	927
Unmapped	2,530
Supplementary	12
Unique genes	55
Unique isoforms	44

Input reads: The total number of reads in the input data.
Estimated cells: The estimated number of real cells identified by the workflow.
Mean reads per cell: The average number of reads per real cell.
Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
Median genes per cell: The median number of unique genes identified per real cell.

Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
Mapped: The number of primary alignments.
Unmapped: The number of reads that were not mapped to the reference genome.
Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads	4,970
Estimated cells	1,112
Reads per cell (mean)	4
UMIs per cell (median)	1
Genes per cell (median)	1

Barcode rank plot

Alignment / feature summary

Pass reads	4,329
Mapped	1,049
Unmapped	3,280
Supplementary	71
Unique genes	22
Unique isoforms	17

Input reads: The total number of reads in the input data.
Estimated cells: The estimated number of real cells identified by the workflow.
Mean reads per cell: The average number of reads per real cell.
Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
Median genes per cell: The median number of unique genes identified per real cell.

Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
Mapped: The number of primary alignments.
Unmapped: The number of reads that were not mapped to the reference genome.
Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
Unique genes/isoforms: The total number of features identified across all cells.

Read summary

Read assignment summary

	Full length	Valid barcode	Gene assigned	Transcript assigned
count	15,552	13,774	5,984	4,398
% full length reads	100.00%	88.57%	38.48%	28.28%

Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
Gene assigned: Proportion of reads unambiguously assigned to a gene.
Transcript assigned: Proportion of reads unambiguously assigned a transcript.

	Full length	Valid barcode	Gene assigned	Transcript assigned
count	4,565	879	298	227
% full length reads	100.00%	19.26%	6.53%	4.97%

Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
Gene assigned: Proportion of reads unambiguously assigned to a gene.
Transcript assigned: Proportion of reads unambiguously assigned a transcript.

	Full length	Valid barcode	Gene assigned	Transcript assigned
count	3,687	244	108	81
% full length reads	100.00%	6.62%	2.93%	2.20%

Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
Gene assigned: Proportion of reads unambiguously assigned to a gene.
Transcript assigned: Proportion of reads unambiguously assigned a transcript.

	Full length	Valid barcode	Gene assigned	Transcript assigned
count	4,440	417	37	28
% full length reads	100.00%	9.39%	0.83%	0.63%

Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
Gene assigned: Proportion of reads unambiguously assigned to a gene.
Transcript assigned: Proportion of reads unambiguously assigned a transcript.

Adapter configuration

Full length reads are defined as those flanked by primers/adapters in the expected orientations: adapter1---cDNA---adapter2.

These full length reads can then be oriented in the same way and are used in the next stages of the workflow. If `full_length_only` is set to `false` reads with all primer configurations are analysed.

Every library prep will contain some level of non-standard adapter configuration artifacts. These are not used for subsequent stages of the workflow. These plots show the proportions of different adapter configurations within each sample, which can help in diagnosing library preparation issues. For most applications, the majority of reads should be full_length.

The adapters used to identify read segments vary slightly between the supported kits. They are:

3prime, multiome and visium kits:

Adapter1: Read1
Adapter2: TSO

5prime kit:

Adapter1: Read1
Adapter2: Non-Poly(dT) RT primer

Saturation

Sequencing saturation is an indication of how well the diversity of a library has been captured in an experiment. As sequencing depth increases, the number of detected genes and unique molecular identifiers (UMIs) will also increase at a rate that depends on the complexity of the input library. The curve gradient indicates the rate at which new genes or UMIs are being recovered; as saturation increases the the curve flattens. All metrics are calculated through random sampling of the complete dataset.

Sequencing saturation: The total number of unique cDNA molecules observed having sampled sequencing reads. Calculated as 1 - (number of unique UMIs / number of reads).
Gene saturation: Unique genes observed per cell. Calculated as a median across cells, after sampling the expression matrix.
UMI saturation: Unique UMIs observed per cell. Calculated as a median across cells, after sampling the expression matrix.

UMAP projections

This section presents various UMAP projections of the data. UMAP is an unsupervised algorithm that projects the multidimensional single cell expression data into two dimensions. This could reveal structure in the data representing different cell types or cells that share common regulatory pathways, for example. The UMAP algorithm is stochastic; analysing the same data multiple times with UMAP, using identical parameters, can lead to visually different projections. In order to have some confidence in the observed results, it can be useful to run the projection multiple times and so a series of UMAP projections can be viewed below.

Software versions

Name	Version
pysam	0.22.1
parasail	1.2.4
pandas	2.0.3
rapidfuzz	2.13.7
scikit-learn	1.7.2
minimap2	2.24-r1122
samtools	1.21
bedtools	v2.30.0
gffread	0.12.7
seqkit	v2.10.1
stringtie	2.2.3

Workflow parameters

Key	Value
fastq	wf-single-cell/data/test_data/fastq/
bam	None
spaceranger_bam	None
adapter_stats	None
out_dir	wf-single-cell
sample_sheet	None
sample	None
single_cell_sample_sheet	wf-single-cell/data/test_data/samples.test.csv
kit_config	None
kit	None
threads	4
full_length_only	True
min_read_qual	None
fastq_chunk	2500
barcode_adapter1_suff_length	10
barcode_min_quality	15
barcode_max_ed	2
barcode_min_ed_diff	2
gene_assigns_minqv	30
matrix_min_genes	1
matrix_min_cells	1
matrix_max_mito	100
matrix_norm_count	10000
genes_of_interest	None
umap_n_repeats	3
expected_cells	None
estimate_cell_count	True
mito_prefix	MT-
stringtie_opts	-c 2
call_variants	False
report_variants	None
call_fusions	False
ref_genome_dir	wf-single-cell/data/test_data/refdata-gex-GRCh38-2020-A
ctat_resources	None
epi2me_resource_bundle	None
store_dir	wf-single-cell/store_dir
resource_bundles	{'gex-GRCh38-2024-A': {'10x': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/refdata-gex-GRCh38-2024-A.tar.gz', 'ctat-lr-fusion': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/ctat_genome_lib_10x_2024.tar.gz'}, 'gex-GRCh38-2024-A_chr_20-21': {'10x': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/refdata-gex-GRCh38-2024-A_chr20_21.tar.gz', 'ctat-lr-fusion': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/ctat_genome_lib_chr20_21_UyHq1cFI.tar.gz'}}