BIDMC TMC / STU - Illumina Spatial Transcriptomics Technical Framework

Shuoshuo Wang; Antonella Arruda de Amaral; Sheethal Umesh Nagalakshmi; Ioannis Vlachos

May 07, 2026

Version 1

BIDMC TMC / STU - Illumina Spatial Transcriptomics Technical Framework V.1

DOI

https://dx.doi.org/10.17504/protocols.io.e6nvwwr29vmk/v1

¹Beth Israel Deaconess Medical Center;
²Broad Institute of MIT and Harvard;
³Harvard Medical School

Human BioMolecular Atlas Program (HuBMAP) Method Development Community
Tech. support email: [email protected]

Shuoshuo Wang

Beth Israel Deaconess Medical Center, Broad Institute of MIT...

DOI: https://dx.doi.org/10.17504/protocols.io.e6nvwwr29vmk/v1

Protocol Citation: Shuoshuo Wang, Antonella Arruda de Amaral, Sheethal Umesh Nagalakshmi, Ioannis Vlachos 2026. BIDMC TMC / STU - Illumina Spatial Transcriptomics Technical Framework . protocols.io https://dx.doi.org/10.17504/protocols.io.e6nvwwr29vmk/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 29, 2026

Last Modified: May 07, 2026

Protocol Integer ID: 315986

Keywords: Spatial transcriptomics, Illumina spatial transcriptomics, Tissue atlas, tissue mapping, Fresh frozen cryosections, OCT embedding, Cryosectioning, H&E brightfield imaging, Fiducial markers, image registration, Poly(A) capture, poly(dT) probes, Spatial barcodes, Unique Molecular Identifier (UMI), On-slide reverse transcription, Template switching oligonucleotide (TSO), Dual-indexed sequencing libraries, Custom read structure, NovaSeq X, DRAGEN spatial pipeline, Cell segmentation, Cell-by-gene expression matrix, Connected multiomics, tertiary analysis, illumina spatial transcriptomics technical framework, format spatial transcriptomics experiments in tissue mapping context, based spatial transcriptomic, spatial transcriptomic, spatial transcriptomics experiment, spatial gene expression profiling, oligonucleotides that encode spatial barcode, rna capture, throughput illumina sequencer, tissue mapping context, downstream spatial registration, unique molecular identifier, cell segmentation, encode spatial barcode,

Funders Acknowledgements:

National Heart Lung and Blood Institute

Grant ID: U54HL165440

National Institutes of Health National Institute of Allergy and Infectious Diseases

Grant ID: P01AI179405

National Cancer Institute

Grant ID: P30CA006516

Disclaimer

DISCLAIMER — INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK
This protocol describes a pre-release and non-commercial technological framework based on publicly available information and general domain knowledge. It does not represent a finalized commercial product, official specification, or endorsed workflow, and all described features, capabilities, and timelines are subject to change without notice.

This document is not affiliated with, sponsored by, endorsed by, or reviewed by Illumina or its affiliates. Product and service names are used solely for identification purposes and may be trademarks of their respective owners.

The authors do not disclose or intend to disclose any confidential, proprietary, or non-public information. Users are solely responsible for ensuring compliance with all applicable laws, agreements, institutional policies, licenses, and confidentiality obligations prior to any use, reproduction, redistribution, implementation, or publication of this material.

This protocol is provided “AS IS” and without warranties or representations of any kind, express or implied, including any warranties of accuracy, completeness, merchantability, fitness for a particular purpose, non-infringement, or suitability for clinical, diagnostic, regulatory, commercial, or safety-related use. 

Content posted on protocols.io may not be peer reviewed, independently validated, or formally approved and does not constitute legal, medical, clinical, safety, regulatory, or other professional advice. No person should rely upon this material as the basis for technical, commercial, regulatory, clinical, or investment decisions without independent verification and professional review.

To the fullest extent permitted by applicable law, any use of or reliance upon this protocol is solely at the user’s own risk, and the authors, contributors, affiliated institutions, hosting platforms, and their respective personnel disclaim liability for any direct, indirect, incidental, consequential, special, exemplary, or other damages arising from the use of, inability to use, or reliance upon this material.

This document may contain forward-looking statements regarding anticipated technologies, product capabilities, or release timelines. Actual products, features, performance characteristics, or timelines may differ materially. The authors undertake no obligation to update or revise such statements.

Nothing in this disclaimer excludes or limits liability to the extent such exclusion or limitation is prohibited under applicable law.

Abstract

Illumina’s sequencing-based spatial transcriptomics workflow leverages a high-density, planar capture substrate and polyadenylated RNA capture to enable large-area spatial gene expression profiling at quasi-cellular resolution. Tissue cryosections are mounted onto a barcoded capture region, cold methanol fixed and stained for brightfield histology, and imaged with fiducial markers to support downstream spatial registration. Following optimized permeabilization, released poly(A)+ transcripts hybridize to surface-bound oligonucleotides that encode spatial barcodes and unique molecular identifiers, enabling quantitative molecule counting after sequencing. Spatially indexed cDNA is synthesized on-slide, released, amplified, and converted into dual-indexed libraries compatible with high-throughput Illumina sequencers using asymmetric read structures that separate barcode/UMI decoding from transcript sequencing. Computational processing integrates image registration, read alignment, UMI deduplication, and cell segmentation to generate a cell-by-gene matrix mapped back to tissue morphology. This framework summarizes practical pre-analytical handling, library construction logic, sequencing considerations, and analysis concepts to support consistent execution and interpretation of large-format spatial transcriptomics experiments in tissue mapping contexts.

Guidelines

This protocol is intended solely for research, exploratory, and methodological evaluation purposes by appropriately trained personnel operating within qualified research or laboratory environments. Users are responsible for independently validating all procedures, reagents, computational methods, safety measures, and analytical outputs prior to any experimental, translational, clinical, diagnostic, regulatory, or commercial application.

Because this protocol may reference pre-release, developmental, or evolving technologies, workflows, software features, chemistry configurations, and analytical assumptions may change without notice. Users should confirm compatibility with their own instrumentation, software environment, institutional requirements, and applicable manufacturer documentation before implementation.

All work involving human-derived materials, genomic information, or associated metadata must be conducted in compliance with applicable laws, regulations, institutional policies, informed consent restrictions, material transfer agreements, data use agreements, privacy requirements, and ethical approvals. Users are solely responsible for ensuring appropriate authorization for sample acquisition, processing, storage, analysis, transfer, sharing, and publication.

This protocol does not establish or imply clinical validity, diagnostic utility, regulatory clearance, or fitness for patient management, treatment decisions, or reporting of clinical results.

Safety warnings

This protocol may involve the handling of human biological specimens, potentially identifiable genomic information, hazardous laboratory reagents, bioinformatic pipelines, and high-throughput instrumentation. Improper handling, processing, interpretation, storage, or disclosure may result in biosafety, privacy, ethical, regulatory, or data security risks.

Human specimens should be treated as potentially biohazardous and handled using appropriate biosafety procedures, institutional safeguards, and applicable occupational safety standards. Users must follow all local requirements regarding personal protective equipment (PPE), specimen containment, waste disposal, decontamination, and incident reporting.

Genomic and transcriptomic data may contain sensitive or potentially identifiable information. Users are responsible for implementing appropriate safeguards for privacy, confidentiality, cybersecurity, controlled access, de-identification, and data governance consistent with applicable regulations and institutional policies, including HIPAA, GDPR, or equivalent frameworks where applicable.

This protocol is not validated for clinical, diagnostic, prognostic, therapeutic, or patient management use. Any research findings generated using this protocol must undergo independent analytical validation, clinical validation, regulatory review, and institutional approval before consideration in any clinical context.

Ethics statement

All human-derived specimens, associated clinical information, and genomic materials used in connection with this protocol must be collected, transferred, stored, analyzed, and shared in accordance with applicable ethical standards, institutional policies, informed consent requirements, and regulatory approvals.

Users are responsible for obtaining all necessary approvals from their Institutional Review Board (IRB), Research Ethics Committee (REC), or equivalent oversight body prior to initiating work involving human participants or human-derived materials. All activities should comply with applicable national and international ethical principles governing human subjects research, including respect for participant autonomy, privacy, confidentiality, and data protection.

Only appropriately consented specimens and datasets should be used. Any secondary use, cross-institutional transfer, public dissemination, or downstream computational analysis of human genomic or transcriptomic data must remain consistent with the scope of the original participant consent and all applicable data governance requirements.

This protocol is intended exclusively for research use and does not constitute a clinically validated assay, diagnostic test, or medical device.

Overview

The Illumina spatial transcriptomics assay is designed to address these limitations. By utilizing proprietary flow cell manufacturing and surface chemistry, the platform introduces a continuous capture substrate featuring 1-micrometer structural resolution, paired with unbiased polyadenylated (polyA) transcript capture. This framework utilizes high-throughput sequencing, specifically on platforms such as the NovaSeq X Series, with machine learning-based cell segmentation and cloud-based tertiary analysis pipelines. The resulting ecosystem allows for the simultaneous mapping of individual cells across large tissue sections while maintaining transcriptomic depth and spatial fidelity.

The Illumina platform’s 50 × 15 mm footprint is designed to accommodate intact tissue sections, such as whole murine organs or multi-tissue microarrays. This area reduces the bioinformatics requirement of computationally stitching multiple disparate sequencing runs together, a process that can introduce batch effects, edge artifacts, and alignment errors. Furthermore, the 1-micrometer feature size allows distinct spatial barcodes to fall within the physical boundary of a single mammalian cell (which typically ranges from 5 to 10 micrometers in diameter). This dense functionalization minimizes the artificial mixing of transcripts from adjacent cells, allowing downstream bioinformatics algorithms to assign captured mRNA to discrete, individual cells.

Substrate Architecture and Surface Biochemistry

The Illumina sequencing-based spatial assay is built upon a solid-phase capture substrate formatted as a standard 75 × 25 mm glass microscope slide. Within the approximately 7.5 cm² active capture area, the surface is patterned with ~750 million spatially encoded capture features.

The surface chemistry is derived from Illumina flow cell technology used in platforms such as the NovaSeq 6000 and NovaSeq X. The functionalized surface contains a dense lawn of immobilized capture oligonucleotides engineered to link physical tissue coordinates with downstream sequencing reads.
Each capture oligonucleotide contains three principal functional elements:

1. a predefined spatial barcode corresponding to a specific ~1 µm surface location,

2. a Unique Molecular Identifier (UMI) enabling computational deduplication of PCR-derived amplification artifacts and quantitative transcript counting, and

3. a 3′ poly-deoxythymidine (poly-dT) capture sequence that hybridizes to the polyadenylated tails of mature eukaryotic mRNA molecules.

Because transcript capture is mediated through polyA hybridization rather than gene-specific probes, the assay operates as an untargeted whole-transcriptome workflow compatible with polyadenylated transcripts across diverse eukaryotic species.

Pre-Analytical Tissue Processing and Histology

The success of a spatial transcriptomics experiment is dependent on the pre-analytical tissue handling. Because the assay measures RNA, subject to degradation by endogenous RNases, while demanding morphological preservation for histological imaging, the tissue preparation protocol requires standardization.

Tissue Preservation and Embedding Protocols
The initial commercial iteration of the Illumina spatial technology is optimized for fresh-frozen (FF) tissue. Fresh-frozen tissues preserve higher quality RNA, which improves the sensitivity of the polyA capture mechanism. Tissues are rapidly harvested and immediately embedded in Optimal Cutting Temperature (OCT) compound, followed by snap-freezing in an isopentane bath chilled with liquid nitrogen. This rapid freezing prevents the formation of large ice crystals that can lyse cell membranes and compromise the structural integrity of the tissue architecture.

Cryosectioning and Spatial Placement
To initiate the physical workflow, the OCT-embedded tissue block is mounted onto a cryostat microtome. The internal temperature of the cryostat is maintained between -15°C and -25°C, depending on the lipid content of the tissue type. The tissue is sectioned at a recommended thickness at 10 micrometers. This thickness range allows for the capture of a mono-layer of cells while remaining thin enough to facilitate the vertical diffusion of RNA during the permeabilization step.

The tissue section is then transferred onto the 50 × 15 mm active capture area of the Illumina spatial slide. Proper placement is a required variable; operators must ensure that the tissue lies flat against the functionalized surface without folds, tears, or trapped air bubbles. A physical gap between the tissue and the capture probes can result in lateral RNA diffusion (loss of spatial resolution) or failure of RNA hybridization in that localized zone.

Histological Staining, Imaging, and Fiducial Registration
Once the tissue is adhered to the slide, it undergoes fixation (typically utilizing cold methanol) to stabilize the morphological architecture and reduce RNA degradation. Following fixation, the tissue is subjected to standard Hematoxylin and Eosin (H&E) staining. Hematoxylin acts as a basic dye, staining acidic nucleic acids within the nucleus blue or purple. Eosin acts as an acidic counterstain, targeting basic cytoplasmic and extracellular matrix proteins, rendering them pink.

After staining, the entire 50 × 15 mm area is imaged using a brightfield microscope. This imaging step acts as the spatial reference map for the downstream computational experiment. The Illumina spatial slide incorporates specific fiducial markers—alignment targets etched onto the glass substrate. The microscopy must capture both the H&E tissue morphology and the precise locations of these fiducial markers in a multi-tile image file.

During the computational phase, the Illumina imaging and registration software uses these fiducial markers to computationally anchor the digital transcriptomic grid directly onto the corresponding H&E image. This image registration enables the correlation of multi-dimensional gene expression signatures with distinct histological structures.

In Situ Transcript Capture and Library Preparation

Following tissue placement and morphological imaging, the slide transitions into the molecular biology wet-lab workflow. This phase, referred to as ex-situ spatial transcriptomics within public literature, involves a series of biochemical reactions that execute the capture of the transcriptome and its conversion into a spatially indexed, sequenceable library.

Tissue Permeabilization Kinetics
The permeabilization step links tissue biology and molecular capture. A permeabilization enzyme mix (typically containing pepsin or proteinase K alongside detergents) is applied to the tissue section. This enzymatic cocktail digests the lipid bilayers of the cellular membranes and extracellular matrix, opening microscopic pores in the cells. Upon membrane breach, the cytosolic polyadenylated mRNA molecules diffuse downward along the z-axis, coming into contact with the 1-micrometer capture surface immediately below.

The kinetics of this permeabilization step are tissue-specific and influence the assay's performance. If the tissue is under-permeabilized, insufficient mRNA will exit the cellular boundaries to reach the capture probes, resulting in lower transcriptomic sensitivity and depressed UMI counts. Conversely, over-permeabilization causes the mRNA to diffuse laterally across the slide (along the x and y axes) before hybridizing to the capture probes. This lateral diffusion degrades the spatial resolution of the data, a phenomenon known as transcript "bleeding" or "blurring." Upon contact with the functionalized surface, the polyA tails of the mRNA hybridize to the poly-dT sequences of the spatially barcoded capture probes, locking the transcript to its Cartesian coordinate.

cDNA Synthesis
Once the mRNA is captured onto the solid substrate, the slide is subjected to a reverse transcription (RT) reaction. A reverse transcriptase enzyme is introduced, which uses the captured mRNA as a template and the spatial capture probe as a primer. As the reverse transcriptase synthesizes the complementary DNA (cDNA) strand, it covalently links the tissue's mRNA sequence to the spatial barcode and the UMI originally synthesized on the Illumina slide.

Transcript Release and Amplification
Following the synthesis of the spatially indexed cDNA on the surface of the slide, the tissue remnants are biochemically digested and washed away, leaving the covalently bound, spatially barcoded cDNA library attached to the glass.

To transition this surface-bound library into a liquid format, the cDNA must be released. This is achieved either through targeted enzymatic cleavage of a specific restriction recognition site engineered into the base of the capture probe, or through alkaline denaturation that separates the newly formed cDNA strand from the surface probe.

Once eluted, the spatially barcoded cDNA undergoes PCR amplification utilizing primers targeting the universal sequences added during the TSO phase. The final step involves standard next-generation sequencing (NGS) library preparation. 

The full-length cDNA is enzymatically fragmented to size-select molecules appropriate for Illumina platforms, and standard Illumina sequencing adapters (P5 and P7) alongside sample multiplexing indices (i5 and i7) are ligated to the fragments. This process yields a dual-indexed sequencing library where every DNA molecule contains the transcript sequence, a UMI for quantitative counting, and a spatial barcode identifying its original 1-micrometer location on the slide.

Sequencing Topologies and Hardware Dynamics

Resolving a large number of cells across a 7.5 cm² area with 1-micrometer features necessitates a high volume of sequencing reads to achieve transcriptomic depth. The sequencing parameters and platform selection are configured to support the requirements of the spatial libraries.

Sequencing Platform Suitability and Data Output
Because of the sequencing depth required to sample the 750 million capture features, the platform is optimally paired with Illumina's high-throughput sequencers.
PlatformChemistry BaseMaximum Output (per run)Single Reads (per run)Primary Spatial Utility
NovaSeq X PlusXLEAP-SBSUp to 21 Tb52 to 70 BillionLarge-scale tissue atlasing; Multiomic multi-slide runs 
NovaSeq 6000Standard SBSUp to 6 TbUp to 20 BillionStandard spatial transcriptomic experiments 
NextSeq 2000XLEAP-SBSUp to 540 Gb (with P4 flow cell)Up to 1.8 BillionPilot studies; Library quality control; Small targeted biopsies 
Table: Sequencing hardware specifications for Illumina spatial transcriptomics.

Read Structure
The spatial cDNA libraries utilize a specific custom read structure to decode the different functional domains of the DNA molecule. Unlike standard whole-genome sequencing where both Read 1 and Read 2 interrogate biological insert DNA symmetrically (e.g., 2 × 150 bp), spatial libraries divide the sequencing cycles asymmetrically between the engineered spatial barcode/UMI and the biological transcript.

While specific cycle allocations for the pre-release Illumina spatial assay are subject to minor kit-specific modifications, the fundamental architecture required for 1-micrometer spatial transcriptomic read configurations is the dual-indexed, paired-end configurations. The exact cycle allocations for the assay are pending the publication of the final commercial release kit inserts.

Secondary Analysis via the DRAGEN Spatial Pipeline

The output of a NovaSeq spatial run comprises unaligned BCL (base call) files that are converted into FASTQ files. The density of 750 million 1-micrometer features generates a data footprint that requires high processing power of Dynamic Read Analysis for GENomics (DRAGEN) architecture to process this data. DRAGEN is a hardware-accelerated bioinformatics platform utilizing Field Programmable Gate Arrays (FPGAs) to execute genomic alignment operations at speeds exceeding standard CPU processing.

The Spatial Image Aligner and Fiducial Mapping
Before genomic alignment occurs, the secondary analysis begins with the Spatial Image Aligner. This module imports the brightfield H&E images captured during the pre-analytical phase. The algorithm executes an automated image quality control (QC) check, scanning for focus aberrations, illumination gradients, and tissue folding.

The Spatial Image Aligner algorithms identify the fiducial markers on the image and superimpose them over the digital coordinate grid of the capture features. This creates an affine registration map. The aligner also performs automated tissue border detection, creating a digital mask that differentiates regions containing tissue from empty background. This masking step instructs the algorithms to ignore sequencing reads or spatial barcodes that originate outside the tissue boundaries, reducing processing times.

DRAGEN RNA Alignment and UMI Deduplication
Once the spatial coordinates are mapped to the image, the DRAGEN spatial pipeline processes the reads in the FASTQ files. The Read 2 sequences (containing the biological transcript) are aligned to a designated reference genome using splice-aware RNA mapping algorithms. Concurrently, the Read 1 sequences are decoded to extract the spatial barcode and the UMI.

The pipeline then merges these data streams. Every aligned transcript is assigned its physical coordinate on the slide based on its spatial barcode. Following spatial assignment, the pipeline performs UMI deduplication. Because the library preparation involved PCR amplification, multiple identical sequencing reads may arise from a single original mRNA molecule. The DRAGEN pipeline collapses all identical reads that share the same genomic alignment locus, spatial barcode, and UMI into a single count, generating expression data representative of true biological expression rather than PCR amplification artifacts.

Machine Learning-Based Cell Segmentation
A key feature in this workflow is the machine learning-based cell segmentation model. In spatial technologies with lower resolution (e.g., 55-micrometer spots), spatial barcodes were aggregated by their physical location, grouping signals from multiple interacting cells. Because the Illumina substrate utilizes 1-micrometer features, the spatial pipeline achieves single-cell resolution by applying segmentation models directly to the H&E image.

The model scans the H&E image to identify individual cellular nuclei based on the hematoxylin stain. Once a nucleus is identified, the algorithm executes a cell border expansion protocol, estimating the boundaries of the cytoplasm extending outward from the nucleus. This creates a digital polygon representing the physical perimeter of cells in the tissue section. The DRAGEN pipeline then aggregates the 1-micrometer spatial barcodes that fall within the boundaries of each cellular polygon. Transcripts are binned into distinct cells rather than arbitrary geometric grids.

This approach minimizes the contamination of transcriptomic profiles between closely packed cells. The output of the DRAGEN spatial pipeline is a cell-by-gene expression matrix, documenting the molecule counts for genes across the segmented cells. The DRAGEN architecture can complete the secondary processing of a 3-million-cell dataset in 13.5 to 22 hours.

Tertiary Analysis and Multiomic Integration via Illumina Connected Multiomics (ICM)

Interpreting the cell-by-gene expression matrix generated by DRAGEN is facilitated through Illumina Connected Multiomics (ICM), a cloud-based software suite designed for high-dimensional data analysis. The platform provides tools to transition from raw matrices to analyzed data.

Interactive Visualization and Dimensionality Reduction
ICM serves as a graphical interface for spatial exploration. The software ingests the matrix outputs from the DRAGEN pipeline, allowing researchers to visualize spatial transcriptomic data overlaid onto the original H&E tissue images. The interface supports navigation across the 50 × 15 mm tissue area, enabling zooming from the macroscopic view of the organ architecture to the cellular level.

To process the high-dimensionality of whole-transcriptome data, ICM executes statistical modeling and dimensionality reduction algorithms. These mathematical transformations organize the transcriptomic data into a 2D or 3D visual plot where cells with similar biological states cluster together. Researchers can filter the data to isolate cells expressing specific biomarkers or adjust thresholds for total transcript counts. This generates spatial heatmaps that map the localized expression of signaling pathways over specific morphological regions.

Advanced Clustering and Spatial Differential Expression
ICM also executes community detection algorithms, primarily utilizing Leiden clustering. The Leiden algorithm groups the segmented cells based on the similarity of their whole-transcriptome expression profiles, classifying the tissue into biological cell types and states independent of physical location.
Once the cells are clustered and assigned biological identities, ICM enables differential expression analysis. Researchers can select distinct geometric Regions of Interest (ROIs) on the tissue image and computationally compare their transcriptomic profiles. The software executes statistical tests to identify marker genes associated with this spatial heterogeneity. In analytical validation studies performed on murine kidneys, researchers utilized ICM to observe the localized, structure-specific expression of the Kap gene within defined renal substructures, demonstrating the platform's ability to link molecular data with physiological anatomy.

Multiomic Integration
ICM is also designed as an integrative multimodal platform. As research models increasingly incorporate multiple data streams, ICM overlays and correlates distinct omic modalities. The platform supports the ingestion of bulk RNA sequencing, multiplexed single-cell transcriptomics, proteomics (via the Illumina Protein Prep workflow), and epigenetics (such as the Illumina 5-base solution for simultaneous genetic variant detection and DNA methylation analysis).

This multiomic integration allows researchers to model complex biological mechanisms, such as mapping epigenetic regulation to transcriptomic expression and anchoring the resulting phenotypic state to a spatial organization within the tissue microenvironment.

Platform	Chemistry Base	Maximum Output (per run)	Single Reads (per run)	Primary Spatial Utility
NovaSeq X Plus	XLEAP-SBS	Up to 21 Tb	52 to 70 Billion	Large-scale tissue atlasing; Multiomic multi-slide runs
NovaSeq 6000	Standard SBS	Up to 6 Tb	Up to 20 Billion	Standard spatial transcriptomic experiments
NextSeq 2000	XLEAP-SBS	Up to 540 Gb (with P4 flow cell)	Up to 1.8 Billion	Pilot studies; Library quality control; Small targeted biopsies