Summary

Experiment summary

Input reads 16,004
Estimated cells 1,962
Reads per cell (mean) 8
UMIs per cell (median) 4
Genes per cell (median) 4

Barcode rank plot

Alignment / feature summary

Pass reads 15,397
Mapped 15,041
Unmapped 356
Supplementary 269
Unique genes 221
Unique isoforms 200
  • Input reads: The total number of reads in the input data.
  • Estimated cells: The estimated number of real cells identified by the workflow.
  • Mean reads per cell: The average number of reads per real cell.
  • Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
  • Median genes per cell: The median number of unique genes identified per real cell.
Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
  • Mapped: The number of primary alignments.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
  • Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads 4,870
Estimated cells 1,263
Reads per cell (mean) 4
UMIs per cell (median) 1
Genes per cell (median) 1

Barcode rank plot

Alignment / feature summary

Pass reads 4,520
Mapped 1,314
Unmapped 3,206
Supplementary 25
Unique genes 67
Unique isoforms 58
  • Input reads: The total number of reads in the input data.
  • Estimated cells: The estimated number of real cells identified by the workflow.
  • Mean reads per cell: The average number of reads per real cell.
  • Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
  • Median genes per cell: The median number of unique genes identified per real cell.
Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
  • Mapped: The number of primary alignments.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
  • Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads 4,825
Estimated cells 690
Reads per cell (mean) 7
UMIs per cell (median) 1
Genes per cell (median) 1

Barcode rank plot

Alignment / feature summary

Pass reads 3,457
Mapped 927
Unmapped 2,530
Supplementary 12
Unique genes 55
Unique isoforms 44
  • Input reads: The total number of reads in the input data.
  • Estimated cells: The estimated number of real cells identified by the workflow.
  • Mean reads per cell: The average number of reads per real cell.
  • Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
  • Median genes per cell: The median number of unique genes identified per real cell.
Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
  • Mapped: The number of primary alignments.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
  • Unique genes/isoforms: The total number of features identified across all cells.

Experiment summary

Input reads 4,970
Estimated cells 1,112
Reads per cell (mean) 4
UMIs per cell (median) 1
Genes per cell (median) 1

Barcode rank plot

Alignment / feature summary

Pass reads 4,329
Mapped 1,049
Unmapped 3,280
Supplementary 71
Unique genes 22
Unique isoforms 17
  • Input reads: The total number of reads in the input data.
  • Estimated cells: The estimated number of real cells identified by the workflow.
  • Mean reads per cell: The average number of reads per real cell.
  • Median UMI counts per cell: The median number of unique molecular identifiers (UMIs) per cell.
  • Median genes per cell: The median number of unique genes identified per real cell.
Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Pass reads: The total number of reads that passed the input filtering stages of the analysis. This number excludes reads where the expected adapters were not found.
  • Mapped: The number of primary alignments.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Supplementary: The number of supplementary alignments. These can be indicative of fusion genes or chimeric reads.
  • Unique genes/isoforms: The total number of features identified across all cells.

Read summary

Read assignment summary

Full length Valid barcode Gene assigned Transcript assigned
count 15,552 13,774 5,984 4,398
% full length reads 100.00% 88.57% 38.48% 28.28%
  • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
  • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
  • Gene assigned: Proportion of reads unambiguously assigned to a gene.
  • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
Full length Valid barcode Gene assigned Transcript assigned
count 4,565 879 298 227
% full length reads 100.00% 19.26% 6.53% 4.97%
  • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
  • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
  • Gene assigned: Proportion of reads unambiguously assigned to a gene.
  • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
Full length Valid barcode Gene assigned Transcript assigned
count 3,687 244 108 81
% full length reads 100.00% 6.62% 2.93% 2.20%
  • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
  • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
  • Gene assigned: Proportion of reads unambiguously assigned to a gene.
  • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
Full length Valid barcode Gene assigned Transcript assigned
count 4,440 417 37 28
% full length reads 100.00% 9.39% 0.83% 0.63%
  • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign barcode/UMI.
  • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
  • Gene assigned: Proportion of reads unambiguously assigned to a gene.
  • Transcript assigned: Proportion of reads unambiguously assigned a transcript.

Adapter configuration



Full length reads are defined as those flanked by primers/adapters in the expected orientations: adapter1---cDNA---adapter2.

These full length reads can then be oriented in the same way and are used in the next stages of the workflow. If `full_length_only` is set to `false` reads with all primer configurations are analysed.

Every library prep will contain some level of non-standard adapter configuration artifacts. These are not used for subsequent stages of the workflow. These plots show the proportions of different adapter configurations within each sample, which can help in diagnosing library preparation issues. For most applications, the majority of reads should be full_length.

The adapters used to identify read segments vary slightly between the supported kits. They are:

3prime, multiome and visium kits:

  • Adapter1: Read1
  • Adapter2: TSO

5prime kit:

  • Adapter1: Read1
  • Adapter2: Non-Poly(dT) RT primer

Saturation

Sequencing saturation is an indication of how well the diversity of a library has been captured in an experiment. As sequencing depth increases, the number of detected genes and unique molecular identifiers (UMIs) will also increase at a rate that depends on the complexity of the input library. The curve gradient indicates the rate at which new genes or UMIs are being recovered; as saturation increases the the curve flattens. All metrics are calculated through random sampling of the complete dataset.

  • Sequencing saturation: The total number of unique cDNA molecules observed having sampled sequencing reads. Calculated as 1 - (number of unique UMIs / number of reads).
  • Gene saturation: Unique genes observed per cell. Calculated as a median across cells, after sampling the expression matrix.
  • UMI saturation: Unique UMIs observed per cell. Calculated as a median across cells, after sampling the expression matrix.

UMAP projections

This section presents various UMAP projections of the data. UMAP is an unsupervised algorithm that projects the multidimensional single cell expression data into two dimensions. This could reveal structure in the data representing different cell types or cells that share common regulatory pathways, for example. The UMAP algorithm is stochastic; analysing the same data multiple times with UMAP, using identical parameters, can lead to visually different projections. In order to have some confidence in the observed results, it can be useful to run the projection multiple times and so a series of UMAP projections can be viewed below.

Software versions

Name Version
pysam 0.22.1
parasail 1.2.4
pandas 2.0.3
rapidfuzz 2.13.7
scikit-learn 1.7.2
minimap2 2.24-r1122
samtools 1.21
bedtools v2.30.0
gffread 0.12.7
seqkit v2.10.1
stringtie 2.2.3

Workflow parameters

Key Value
fastq wf-single-cell/data/test_data/fastq/
bam None
spaceranger_bam None
adapter_stats None
out_dir wf-single-cell
sample_sheet None
sample None
single_cell_sample_sheet wf-single-cell/data/test_data/samples.test.csv
kit_config None
kit None
threads 4
full_length_only True
min_read_qual None
fastq_chunk 2500
barcode_adapter1_suff_length 10
barcode_min_quality 15
barcode_max_ed 2
barcode_min_ed_diff 2
gene_assigns_minqv 30
matrix_min_genes 1
matrix_min_cells 1
matrix_max_mito 100
matrix_norm_count 10000
genes_of_interest None
umap_n_repeats 3
expected_cells None
estimate_cell_count True
mito_prefix MT-
stringtie_opts -c 2
call_variants False
report_variants None
call_fusions False
ref_genome_dir wf-single-cell/data/test_data/refdata-gex-GRCh38-2020-A
ctat_resources None
epi2me_resource_bundle None
store_dir wf-single-cell/store_dir
resource_bundles {'gex-GRCh38-2024-A': {'10x': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/refdata-gex-GRCh38-2024-A.tar.gz', 'ctat-lr-fusion': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/ctat_genome_lib_10x_2024.tar.gz'}, 'gex-GRCh38-2024-A_chr_20-21': {'10x': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/refdata-gex-GRCh38-2024-A_chr20_21.tar.gz', 'ctat-lr-fusion': 'https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-single-cell/ctat_genome_lib_chr20_21_UyHq1cFI.tar.gz'}}