Summary

Experiment summary
Input reads 16,004
Estimated cells 1,962
Mean reads per cell 8
Median UMI counts per cell 4
Median genes per cell 4
Barcode rank plot
Alignment / feature summary
Reads aligned 15,397
Reads mapping to genome 15,041
Supplementary 269
Unmapped 356
Unique genes detected 221
Unique isoforms detected 197
  • Input reads: The total number of reads in the input data
  • Estimated cells: The estimated number of cells (real cells) identified by the workflow
  • Mean reads per cell: Total reads divided by the number of real cells
  • Median UMI counts per cell: The median number of UMIs in real cells
  • Median genes per cell: The median number of unique genes identified per real cell
Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Reads aligned: The total number of reads that were aligned to the reference genome sequence. This number excludes reads where the expected adapters were not found.
  • Reads mapping to genome: The number of primary alignments.
  • Supplementary: The number of supplementary alignments can indicate fusion genes or chimeric reads.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Unique genes/isoforms detected: The total number of features identified across all cells.
  • Experiment summary
    Input reads 4,870
    Estimated cells 1,263
    Mean reads per cell 4
    Median UMI counts per cell 1
    Median genes per cell 1
    Barcode rank plot
    Alignment / feature summary
    Reads aligned 4,520
    Reads mapping to genome 1,314
    Supplementary 25
    Unmapped 3,206
    Unique genes detected 67
    Unique isoforms detected 58
    • Input reads: The total number of reads in the input data
    • Estimated cells: The estimated number of cells (real cells) identified by the workflow
    • Mean reads per cell: Total reads divided by the number of real cells
    • Median UMI counts per cell: The median number of UMIs in real cells
    • Median genes per cell: The median number of unique genes identified per real cell
    Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

    The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Reads aligned: The total number of reads that were aligned to the reference genome sequence. This number excludes reads where the expected adapters were not found.
  • Reads mapping to genome: The number of primary alignments.
  • Supplementary: The number of supplementary alignments can indicate fusion genes or chimeric reads.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Unique genes/isoforms detected: The total number of features identified across all cells.
  • Experiment summary
    Input reads 4,825
    Estimated cells 690
    Mean reads per cell 7
    Median UMI counts per cell 1
    Median genes per cell 1
    Barcode rank plot
    Alignment / feature summary
    Reads aligned 3,457
    Reads mapping to genome 927
    Supplementary 12
    Unmapped 2,530
    Unique genes detected 55
    Unique isoforms detected 44
    • Input reads: The total number of reads in the input data
    • Estimated cells: The estimated number of cells (real cells) identified by the workflow
    • Mean reads per cell: Total reads divided by the number of real cells
    • Median UMI counts per cell: The median number of UMIs in real cells
    • Median genes per cell: The median number of unique genes identified per real cell
    Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

    The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Reads aligned: The total number of reads that were aligned to the reference genome sequence. This number excludes reads where the expected adapters were not found.
  • Reads mapping to genome: The number of primary alignments.
  • Supplementary: The number of supplementary alignments can indicate fusion genes or chimeric reads.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Unique genes/isoforms detected: The total number of features identified across all cells.
  • Experiment summary
    Input reads 4,970
    Estimated cells 1,112
    Mean reads per cell 4
    Median UMI counts per cell 1
    Median genes per cell 1
    Barcode rank plot
    Alignment / feature summary
    Reads aligned 4,329
    Reads mapping to genome 1,049
    Supplementary 71
    Unmapped 3,280
    Unique genes detected 22
    Unique isoforms detected 17
    • Input reads: The total number of reads in the input data
    • Estimated cells: The estimated number of cells (real cells) identified by the workflow
    • Mean reads per cell: Total reads divided by the number of real cells
    • Median UMI counts per cell: The median number of UMIs in real cells
    • Median genes per cell: The median number of unique genes identified per real cell
    Cells are ranked by read count in descending order on the x-axis, and the read count for each barcode is displayed on the y-axis. Only high quality barcodes are used to generate the rank plot (min qscore 15 and 100% match to the 10x whitelist)

    The dashed line indicates the read count threshold that was determined by the workflow. Barcodes to the left of this point are considered "real cells", and those to the right are considered as non-cell barcodes and are not included in the downstream analysis.
  • Reads aligned: The total number of reads that were aligned to the reference genome sequence. This number excludes reads where the expected adapters were not found.
  • Reads mapping to genome: The number of primary alignments.
  • Supplementary: The number of supplementary alignments can indicate fusion genes or chimeric reads.
  • Unmapped: The number of reads that were not mapped to the reference genome.
  • Unique genes/isoforms detected: The total number of features identified across all cells.
  • Read summary

    Read assignment summary

    Full length Valid barcode Gene assigned Transcript assigned
    Reads 15,552 13,774 6,015 4,416
    % of_FL 100.00 88.57 38.68 28.40
    • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign. barcode/UMI
    • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
    • Gene assigned: Proportion of reads unambiguously assigned to a gene.
    • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
    Full length Valid barcode Gene assigned Transcript assigned
    Reads 4,565 879 298 227
    % of_FL 100.00 19.26 6.53 4.97
    • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign. barcode/UMI
    • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
    • Gene assigned: Proportion of reads unambiguously assigned to a gene.
    • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
    Full length Valid barcode Gene assigned Transcript assigned
    Reads 3,687 244 108 81
    % of_FL 100.00 6.62 2.93 2.20
    • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign. barcode/UMI
    • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
    • Gene assigned: Proportion of reads unambiguously assigned to a gene.
    • Transcript assigned: Proportion of reads unambiguously assigned a transcript.
    Full length Valid barcode Gene assigned Transcript assigned
    Reads 4,440 417 37 28
    % of_FL 100.00 9.39 0.83 0.63
    • Full length: Proportion of reads containing adapters in the expected configuration. Full-length reads are carried forward in the workflow to attempt to assign. barcode/UMI
    • Valid barcodes: Proportion of reads that have been assigned corrected cell barcodes and UMIs. All reads with valid barcodes are used in the subsequent stages of the workflow.
    • Gene assigned: Proportion of reads unambiguously assigned to a gene.
    • Transcript assigned: Proportion of reads unambiguously assigned a transcript.

    Adapter configuration



    Full length reads are defined as those flanked by primers/adapters in the expected orientations: adapter1---cDNA---adapter2.

    These full length reads can then be oriented in the same way and are used in the next stages of the workflow. If `full_length_only` is set to `false` reads with all primer configurations are analysed.

    Every library prep will contain some level of non-standard adapter configuration artifacts. These are not used for subsequent stages of the workflow. These plots show the proportions of different adapter configurations within each sample, which can help in diagnosing library preparation issues. For most applications, the majority of reads should be full_length.

    The adapters used to identify read segments vary slightly between the supported kits. They are:

    3prime, multiome and visium kits:

    • Adapter1: Read1
    • Adapter2: TSO

    5prime kit:

    • Adapter1: Read1
    • Adapter2: Non-Poly(dT) RT primer

    Saturation

    Sequencing saturation is an indication of how well the diversity of a library has been captured in an experiment. As sequencing depth increases, the number of detected genes and unique molecular identifiers (UMIs) will also increase at a rate that depends on the complexity of the input library. The curve gradient indicates the rate at which new genes or UMIs are being recovered; as saturation increases the the curve flattens

    • Gene saturation: Genes per cell as a function of read depth.
    • UMI saturation: UMIs per cell as a function of read depth.
    • Sequencing saturation: This metric is a measure of the proportion of reads that come from a previously observed UMI, and is calculated with the following formula: 1 - (number of unique UMIs / number of reads).

    UMAP projections

    This section presents various UMAP projections of the data. UMAP is an unsupervised algorithm that projects the multidimensional single cell expression data into 2 dimensions. This could reveal structure in the data representing different cell types or cells that share common regulatory pathways, for example. The UMAP algorithm is stochastic; analysing the same data multiple times with UMAP, using identical parameters, can lead to visually different projections. In order to have some confidence in the observed results, it can be useful to run the projection multiple times and so a series of UMAP projections can be viewed below.

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    No data for COX16

    No data for AAGAB

    No data for CD70

    No data for NOGENE

    Software versions

    Name Version
    pysam 0.22.0
    parasail 1.2.3
    pandas 2.0.3
    rapidfuzz 2.13.7
    scikit-learn 1.3.2
    minimap2 2.24-r1122
    samtools 1.20
    bedtools v2.30.0
    gffread 0.12.7
    seqkit v2.9.0
    stringtie 2.2.2

    Workflow parameters

    Key Value
    fastq wf-single-cell/data/test_data/fastq/
    bam None
    out_dir wf-single-cell
    sample_sheet None
    sample None
    single_cell_sample_sheet wf-single-cell/data/test_data/samples.test.csv
    kit_config None
    kit None
    threads 4
    full_length_only True
    min_read_qual None
    fastq_chunk 2500
    ref_genome_dir wf-single-cell/data/test_data/refdata-gex-GRCh38-2020-A
    barcode_adapter1_suff_length 10
    barcode_min_quality 15
    barcode_max_ed 2
    barcode_min_ed_diff 2
    gene_assigns_minqv 30
    matrix_min_genes 1
    matrix_min_cells 1
    matrix_max_mito 100
    matrix_norm_count 10000
    genes_of_interest None
    umap_n_repeats 3
    expected_cells None
    estimate_cell_count True
    mito_prefix MT-
    stringtie_opts -c 2
    call_variants False
    report_variants None
    store_dir wf-single-cell/store_dir