Computational workflows and data infrastructures for spatial omics analysis

Computational workflows and data infrastructures for spatial omics analysis

Margaret Alexander
1,2,3
,
Yutian Liu
1,2
,
Felipe Segato Dezem
1,2
,
Hannah Chasteen
1,2
,
Jasmine Plummer
1,2,4,* ORCID Icon
*Correspondence to: Jasmine Plummer, Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA. E-mail: jasmine.plummer@stjude.org
EXO. 2026;1:202607. 10.70401/EXO.2026.0010
Received: February 11, 2026Accepted: May 13, 2026Published: May 15, 2026

Abstract

Spatial omics is a broad term referring to technologies that allow for biomolecules to be observed within their native tissue context. These technologies have been used by biomedical researchers to gain a better understanding of cellular interactions, tumor microenvironment dynamics, and immune cell infiltration. While the basic outputs, such as spatial coordinates, segmentation masks, and transcript/protein matrices, are provided by the instrument software, the true biological insights come from several downstream, specialized analysis steps. Since spatial omics remains a relatively new field, no unified analysis pipeline has yet been established to encompass all platforms. Most workflows are adapted from single-cell RNA sequencing analysis frameworks, while incorporating additional steps that are specific to spatial data, especially for imaging-based technologies. At the same time, the diversity of platforms, data modalities, and output formats has introduced substantial challenges for data representation, interoperability, and cross-platform integration, highlighting the need for flexible, spatially aware, and user-friendly data structures made specifically for imaging-based data, not merely adapted from other methods. This review summarizes the general analytical steps following spatial omics data acquisition, commonly used data infrastructures and tools, existing gaps, and future directions in the field.

Keywords

Spatial omics, data analysis, computation

1. Introduction

Advances in transcriptomic technologies have dramatically expanded the ability of biomedical researchers to characterize cellular function, diversity, and organization. Bulk RNA-sequencing first enabled high-throughput measurement of gene expression at the tissue level, offering a global view of transcriptional activity but inherently averaging signals across heterogeneous cell populations[1]. This limitation motivated the development of single-cell RNA sequencing (scRNA-seq), which transformed the field by resolving gene expression at the level of individual cells[2]. With single-cell data came a new generation of computational methods, ranging from normalization frameworks suited to sparse count data, to algorithms for clustering, lineage inference, batch correction, and large scale data integration, each designed to extract structure from increasingly complex cellular landscapes[3].

Despite its success, single-cell profiling requires dissociating tissues, which disrupts the native spatial organization of cells and removes the spatial context of gene expression. This loss of spatial information limits the ability to study cell-cell interactions (CCIs), tissue architecture, and the tumor environment. Spatial omics technologies emerged to restore this critical dimension[4]. Platforms such as sequencing-based spatial transcriptomics (sST), in situ hybridization (ISH), and multiplexed imaging now provide high dimensional molecular measurements directly within intact tissue architectures. As these methods emerged, analytical workflows were first adapted from previous technologies (e.g., scRNA-Seq), then slowly evolved to incorporate image processing, spatial statistics, probabilistic modeling, multimodal integration, and new frameworks for mapping cell types and interactions in situ.

Together, this progression from bulk to single cell to spatial profiling reflects a broader transition from measuring what genes are expressed, to understanding which cells express them, and finally, to uncovering where those cells reside and interact within their native microenvironments. The remainder of this review focuses on the analytical landscape that unfolds once spatial omics data is generated, beginning with an overview of the major spatial technologies and the forms of data they produce. This review will focus on sST and proteomics platforms since these are currently the most popular in the field, and most computational analysis tools have been designed to take these modalities as their primary input (Figure 1).

Figure 1. Schematic overview of computational analysis in spatial omics (transcriptomics and proteomics). Input depicts the output files from various spatial omics platforms, usually containing a count matrix with spatial coordinates and image(s). Most analysis is followed by various secondary analysis steps to clean and pre-process data: quality control and filtering; normalization; and cell segmentation. Tertiary analysis includes analyzing the data to understand the biological context. Created in BioRender. Plummer, J. (2026) https://app.biorender.com/illustrations/69d50cc9c059ab84dd7db164. PCA: principal component analysis; UMAP: uniform manifold approximation and projection; t-SNE: t-distributed stochastic neighbor embedding.

2. Types of Spatial Omics Technologies

Spatial omics technologies can be broadly grouped into three categories based on their measurement approach: sequencing-based sST, imaging-based spatial transcriptomics (iST), and imaging-based spatial proteomics (iSP)[5]. Sequencing-based platforms such as Visium, VisiumHD, Slide-seq V2, and Stereo-seq provide transcriptome-wide coverage but with varying spatial resolution, from near-single-cell to spot-based measurements averaging multiple cells. Imaging-based transcriptomics platforms such as Xenium, MERSCOPE, and CosMx offer subcellular resolution, but mainly profile targeted gene panels. Spatial proteomics, which has gained significant momentum in recent years[6], enables highly multiplexed protein measurement through cyclical IF platforms (e.g., Cellscape, COMET, CODEX, MACSima) and IMC. Beyond these commercial platforms, academia has also contributed significantly to the advancement of spatial omics with approaches such as Spatial-Mux-seq[7], which enables simultaneous multimodal spatial profiling and Deep-STARmap and Deep-RIBOmap[8], which allow single-cell transcriptomics profiling in 3D tissue blocks.

These platforms have fundamental differences that directly impact the data structure of their outputs, computational requirements, and appropriate analytical strategies. This review will cover general workflows for processing each of these categories of spatial omics data. However, even within these categories, platforms exhibit substantial heterogeneity in their technical specifications, necessitating flexible analytical approaches tailored to each.

2.1 Sequencing-based sST

sST allows for transcriptome-wide profiling across tissue sections. These methods rely on spatially indexed capture surfaces where transcripts are tagged with location-specific barcodes[9]. While alternative sST approaches exist, including microdissection-based methods (e.g. LCM-seq) and in situ sequencing (ISS) (e.g., FISSEQ), spatially barcoded capture platforms dominate current research and public repositories. Therefore, this review focuses primarily on these barcoding-based methods (Figure 2).

Figure 2. Overview of the computational workflow for analyzing sST and iST data. Input designates the output files from various spatial platforms. Filtering is conducted as a quality control step. Preprocessing allows for multiple samples to be incorporated into downstream analysis. Secondary analysis includes clustering and cell type annotation using marker genes. Spatial differential expression and neighborhood analysis are used to assess region-specific patterns and CCIs. Tertiary analysis includes visualization and integration across other modalities. Created in BioRender. Plummer, J. (2026) https://app.biorender.com/illustrations/69602bcf9f79a39835492d13?slideId=4056948a-e057-4b5a-a9b8-3a8c09b04843. sST: sequencing spatial transcriptomics; iST: imaging based spatial transcriptomics; CCIs: cell-cell interactions.

Most sST platforms use capture areas that are larger than individual cells, with diameters ranging from 10 μm to 100 μm in earlier technologies. Visium HD stands out with 2 μm × 2 μm squares, reaching near-single-cell resolution. However, because capture areas are predefined and spatially uniform rather than aligned to cell boundaries, even high-resolution methods do not achieve true single-cell resolution, where each measurement corresponds to an intact, segmented cell.

sST platforms face an inherent resolution-coverage trade-off. Lower resolution spots aggregate more cellular material, yielding higher transcript counts per measurement but obscuring cellular heterogeneity. Higher resolution capture areas approach single-cell dimensions but suffer from extreme sparsity, often capturing only tens to hundreds of unique molecular identifiers (UMIs) per bin. To address this, high-resolution data typically require computational binning or imputation to achieve adequate coverage for downstream analysis. Conversely, lower resolution data necessitates deconvolution methods to infer cell-type composition within each multi-cell spot, often leveraging reference single-cell datasets or spatial patterns to guide decomposition.

The raw output for sequencing-based platforms consists of BCL files which are converted to FASTQ files containing spatial barcodes, UMIs, and cDNA sequences (Table 1). Standard genomic alignment of these FASTQs produces BAM files, which are then processed into spot-by-gene count matrices for downstream analysis. Several platform manufacturers provide dedicated preprocessing workflows to perform read alignment, barcode assignment, and count matrix generation, such as Space Ranger and the stereo-seq analysis workflow from STOmics. For technologies that do not provide this, there are open source, community-created pipelines to perform these same functions, such as the warp analysis research pipeline[11]. Some platforms, such as Visium, also generate tissue images with varying resolutions, such as high-resolution histology images that enable morphological analysis or lower-resolution images for quality control (QC) and spot-to-tissue registration.

Table 1. Summary of spatial omics technologies, their main outputs after initial processing of raw data from the machine, and the pipelines initial processing and visualization.
PlatformPrimary Output FilesPipeline for initial processing/visualization
Sequencing-based Spatial Transcriptomics
VisiumBarcode mappings (.parquet), spot x gene matrices (.h5), images (.tiff, .png)Space Ranger
Visium HDBarcode mappings (.parquet), binned spot x gene matrices (.h5), feature slices (.h5), images (.tiff, .png), cell segmentation (.geojson)Space Ranger
GeoMXCounts data (.dcc), configuration file (.pkc), sequencing data (.fastq), images (.ome.tif, .ome.xml)GeomxTools
Stereo-seqGene expression matrices (.gef, .gem), images (.tiff, .tar.gz)SAW
Slide-seqBead x gene matrix (.h5ad), aligned BAM, FASTQ metrics (.txt), UMI metrics (.csv.gz), gene metrics (.csv.gz), cell metrics (.csv.gz)WARP (Slide-seq pipeline)
DBiT-seqImages (.tif, .png), tissue positions (.csv), fragment files (.tsv.gz)ATX_epigenomics (Github), AtlasXBrowser
Imaging-based Spatial Transcriptomics
CosMxExpression matrix (.csv), polygons (.csv), FOV positions (.csv), transcript data (.csv)AtoMx SIP
XeniumImages (.tif, .ome.tif), cell summary file (.csv.gz), cell and nucleus segmentation (.zarr.zip, .csv.gz, .parquet), transcript data (.parquet, .zarr.zip), cell x gene matrix (.tsv.gz, .h5, zarr.zip)Xenium Onboard AnalysisXenium Ranger
MERSCOPETranscripts (.csv), cell boundaries (.parquet), cell x gene matrix (.csv), images (.tif)MERSCOPE Visualizer
Imaging-based Spatial Proteomics
CODEX (Phenocycler)Raw images (.qptiff), multi-channel images (.tif), cell locations (.csv), cell x protein matrix (.csv),SOPA[10]
MIBIMulti-channel images (.tif), segmentation (.tif, .fcs, .csv, .txt), cell data table (.csv)MIBIscope System
CyTOFMulti-channel images (.tif), segmentation (.csv), cell x protein matrix (.csv)CyTOF instrument
CellScapeRaw images (.ome.tif), multi-channel images (.ome.tif), segmetnation (.csv), cell data table (.csv)QuPath
MACSimaMulti-channel images (.ome.tif), cell data table (.csv, .fcs)MACS iQ View Analysis Software
COMETRaw images (ome.tif), ulti-channel images (.ome.tif), segmentation and dot overlays (.csv), cell data table (.csv)HORIZON

BAM: binary alignment map; SAW: stereo-seq analysis workflow; WARP: warp analysis research pipeline; FOV: field of view; SIP: spatial informatics platform; SOPA: spatial omics pipeline and analysis; MIBI: multiplexed ion beam imaging; COMET: comprehensive multiplexed epitope tracking.

2.2 Imaging-based sST

iST encompasses two main approaches: multiplexed ISH, which uses repeated rounds of probe hybridization and imaging, and ISS, which sequences barcoded transcripts directly in tissue[12]. Both use highly multiplexed fluorescence imaging. ISH-based platforms include MERFISH, seqFISH+, Xenium, and CosMx, while ISS-based platforms include STARmap and BaristaSeq. Unlike sequencing-based approaches, which infer transcript location through barcoded capture spots, imaging-based platforms localize each RNA molecule with single-molecule precision. This enables true single-cell and even subcellular resolution (Figure 2).

Because these platforms require predefined probe sets, they are inherently panel-based. The size of the panel varies by technology, ranging from tens to hundreds of genes to several thousand genes and up to the whole transcriptome. Although this usually limits transcriptome breadth relative to sequencing-based methods, panel-based approaches typically achieve higher sensitivity and lower technical noise.

Most iST platforms include proprietary software for image processing and transcript decoding, such as Xenium Ranger and the AtoMx spatial informatics platform, though the extent of on-instrument processing varies. Each detected transcript receives (x, y) coordinates relative to the tissue section or field of view along with an assigned gene identity. Cell segmentation is required to associate transcripts with individual cells, and this is often performed using nuclear and/or membrane fluorescent markers (e.g., DAPI, DiO). Final processed datasets typically include a transcript table, segmentation masks, a cell-by-gene feature matrix, cell morphology measurements, and high-resolution images.

2.3 iSP

iSP platforms extend the principles of iST to the proteome. Instead of nucleic acid probes, these methods use fluorescently labeled or metal-tagged antibodies to detect dozens to over one hundred protein markers within intact tissue sections. Because proteins represent the functional effectors of cellular processes, such as signaling molecules, receptors, and transcription factors, spatial proteomics reveals layers of biological information that are not directly captured from RNA measurements alone. Critically, these platforms can measure post-translational modifications, activation states (e.g., phosphorylation), and protein abundance, which often correlate poorly with transcript levels due to translational regulation and protein stability.

These assays typically operate through iterative imaging cycles or mass-spectrometry-based detection, enabling high multiplexing without spectral overlap. They provide single-cell or subcellular spatial resolution, often capturing protein localization to membranes, cytoplasmic domains, or specific intracellular compartments. In contrast with transcriptomic imaging methods that detect discrete transcript puncta, proteomic data consist of high-dimensional intensity-based measurements across multiple channels, requiring robust image registration, normalization, artifact correction, and segmentation strategies. Akoya provides an IO60 panel which can analyze 60 protein markers, and other commercialized platforms are able to detect 20-60 plex while some can extend to 100 plex[13].

The primary outputs of iSP platforms consist of multi-channel fluorescence images or ion images from mass spectrometry, with each channel corresponding to a specific protein marker. Cell segmentation, similar to iST, is typically performed using nuclear and/or membrane markers, assigning protein expression measurements to individual cells and defines cellular boundaries. The resulting cell-by-protein expression matrix contains intensity-based measurements rather than discrete counts, representing protein abundance through mean fluorescence intensity, integrated intensity, or other summary statistics per cellular compartment (Figure 3). Most platforms also provide spatial coordinate information and morphological feature measurements that enable spatially aware downstream analysis.

Figure 3. Overview of computational workflow for analyzing iSP data. Created in BioRender. Plummer, J. (2026) https://app.biorender.com/illustrations/69680e6ab268012d91f5251c?slideId=4056948a-e057-4b5a-a9b8-3a8c09b04843. iSP: imaging-based spatial proteomics; PCA: principal component analysis.

2.4 Diverse output formats from spatial omics platforms

Different spatial omics platforms generate data with distinct formats, structures, and analytical requirements. To accommodate spatial coordinates, multimodal measurements, and associated imaging data, several data representations and software ecosystems have been proposed across analysis communities (Figure 4). FIn the Bioconductor R repository ecosystem, SpatialExperiment[14] extends the widely used scRNA-seq data structure SingleCellExperiment[15] by introducing native support for spatial coordinates and associated imaging data. Specifically, spatial metadata is stored alongside cell-level annotations, where spatial coordinates are represented as a dedicated component, and histological images can also be linked to the object. Seurat[16] is another very popular scRNA-seq analysis R framework which already supports various sST and iST platforms.

Figure 4. Overview of data representations for spatial omics. This schematic illustrates how these frameworks organize spatial omics data into common conceptual components. Across both R and Python ecosystems, core elements include: (i) feature-by-cell gene expression matrices (e.g., assays or X), (ii) cell- or spot-level metadata (e.g., colData or obs), (iii) feature-level annotations (e.g., rowData or var), and (iv) reduced dimensional representations and graphs for downstream analysis. In spatially resolved datasets, these structures are extended to incorporate spatial coordinates and, in some cases, linked imaging data. Created in BioRender. Plummer, J. (2026) https://app.biorender.com/illustrations/69e1151c16f29279e0b09ccf.

In the Python ecosystem, Anndata[17] serves as the core data structure adopted by single-cell gene expression data analysis toolkit Scanpy[18]. Voyager[19] is implemented in both R/Bioconductor and Python/PyPI ecosystem, and it introduces SpatialFeatureExperiment data structure which extends SpatialExperiment by Simple Features[20].

Because of the heterogeneity of ST data, challenges in cross-platform integration, and the large data volume of high resolution images from imaging-based technologies, SpatialData[21] was developed. It is a flexible and extensible framework to manage large-scale spatial omics data and support out-of-core computation. It also empowers consistent spatial alignment and cross-modal integration by the coordinate system. SpatialData provides a dedicated I/O interface (spatialdata-io) that supports data ingestion from a wide range of commonly used commercial spatial omics platforms. It requires inputs in the SpatialData Zarr file format, which is an extension from Zarr and OME-NGFF enabling storage of large images, data and metadata, all linked to each other in an efficient and interoperable way.

Within the SpatialData framework, spatial information is organized into five core spatial elements: images, labels, points, shapes, and tables. Raster images are represented as images; transcript coordinates are stored as points; segmentation masks are encoded as labels; geometric objects such as cell boundaries, nuclei outlines, or circular regions are represented as shapes; and molecular measurements like gene and protein expression, inflorescence intensity, and associated metadata are stored in tables as Anndata format.

3. Advanced Data Analysis Steps and Pipelines

Once the data is generated, there are several steps that must be taken to generate biological insights. Not all of these steps are necessary in every case, but they are common in most workflows. These additional pipelines allow for better data refinement and integrative analysis methods.

3.1 Segmentation refinement

Cell segmentation represents a foundational step in imaging-based spatial omics, as accurate assignment of transcripts and proteins to individual cells is a prerequisite for all downstream single-cell analyses. Segmentation errors can propagate through the entire analytical pipeline. Misassignment of transcripts creates artificial co-expression patterns, under-segmentation merges distinct cell types and obscures heterogeneity, and over-segmentation fragments individual cells into spurious subpopulations. These errors fundamentally compromise biological interpretation in cell phenotyping, spatial niche analysis, cell-cell communication inference, and tumor microenvironment characterization[22]. While manufacturers provide default segmentation outputs, these general-purpose algorithms often require tissue-specific parameter optimization, necessitating fine-tuning.

Most platforms include proprietary segmentation refinement tools that allow parameter adjustment without requiring complete re-segmentation. For instance, 10x Genomics launched Xenium Ranger software which introduces a re-segmentation function enabling users to tune nuclear expansion distance, DAPI intensity thresholds, and expected cell size constraints. Similarly, Vizgen’s Post-processing Tool enables manual boundary correction and regeneration of single-cell expression matrices from updated segmentation masks, and Nanostring’s FastReseg[23] uses transcript-guided corrections to improve boundary accuracy in three dimensions. However, these vendor-specific tools are typically limited to their respective platforms and may not accommodate complex tissue architectures or specialized segmentation requirements.

To address these limitations, numerous third-party segmentation methods have been developed, predominantly leveraging deep learning approaches trained on diverse tissue types and imaging modalities (Table 2). Deep learning algorithms such as Cellpose and StarDist are popular choices due to their robust performance across tissue types, pre-trained models, and ability to handle challenging morphologies. Cellpose uses a flow-based representation of cell shapes, making it particularly effective for irregular cell boundaries, while StarDist employs star-convex polygon representations optimized for round or elliptical cells.

Table 2. Summary of segmentation refinement tools, how they work, their inputs, outputs, and language they are implemented in.
Tool/PipelineMethod SummaryInputOutputLanguage
Segmentation Only
FastResegScores transcripts, identifies missegmented transcripts via a SVM, then reassigns transcripts via decision treeImaging-based transcriptomics data, reference transcript profile (scRNA-seq/spatial)Reassigned transcriptsR
Cellpose[24]Generates topological maps, neural network predicts gradients, uses gradient tracking to group pixels in cellsFluorescence/histology imagesSegmentation masksPython
StarDist[25]Object detection based on U-Net, predicts star-convex polygons for every pixelFluorescence/histology imagesSegmented imagesPython
Segger[26]Encodes data as heterogeneous graph, GNN is then trained on cell-transcript links and refines themImaging-based transcriptomics data, scRNA-seq reference (optional)Reassigned transcriptsPython
BIDCell[27]Self-supervised deep-learning model using biologically-informed multiple loss functions to optimize learnable parameters for segmentationSpatial transcriptomics data, histology image, scRNA-seq referenceSegmentation masks, cell x gene matrixPython
Baysor[28]Models data as mixture of cell-specific distributions, uses Bayesian mixture models to separate the mixtureImaging-based transcriptomics dataMolecule coordinates, cell polygonsJulia
Proseg[29]Based on unsupervised probabilistic model of the spatial distribution of transcriptsImaging-based transcriptomics dataReassigned transcripts, cell polygonsRust
RAMCES[30]Uses CNN to learn optimal markers, utilizes weighted combination of selected markers for segmentationImaging-based proteomics dataMarker rankings and weighted imagesPython
Segmentation and Cell Type Classification
JSTA[31]Uses DNN to assign pixel-level cell type labelsSpatial transcriptomics data, scRNA-seq referenceReassigned pixel labelsPython
ClusterMap[32]Integrates spatial and expression data, density peak clustering to identify biologically meaningful structuresImaging-based transcriptomics dataSegmentation mask, cell type annotation, tissue region mapPython
CelloType[33]Transformer-based DNN with multiple branches to perform object detection, segmentation, and classification concurrentlyImaging-based transcriptomics data, histology imageSegmentation mask, object boxes and classesPython
Bering[34]Graph deep learning model utilizes transcript colocalization for cell type annotation, transcript representations are transferred to segmentation taskImaging-based transcriptomics dataCell type annotations, reassigned transcriptsPython

SVM: support vector machine; GNN: graph neural network; CNN: convolutional neural network; DNN: deep neural network; JSTA: joint cell segmentation and cell type annotation.

Some more recent segmentation methods aim to leverage spatial omics-specific information. Segger, a graph neural network-based approach, constructs spatial graphs from transcript locations and uses message passing to refine cell boundaries while requiring less computational resources than image-based deep learning methods. BIDCell and other transcript-aware methods similarly exploit the observation that transcripts from the same cell should cluster spatially, using this biological prior to inform segmentation decisions.

Frequently in iST data, substantial fractions of detected transcripts remain unassigned to any cell after segmentation. While this is conventionally treated as technical noise from extracellular space or segmentation errors, accumulating evidence suggests biological relevance for some unassigned transcripts[35]. Troutpy has been developed to analyze spatial patterns of unassigned transcripts, testing whether they exhibit non-random spatial organization that would indicate biological signal rather than uniform technical noise. If unassigned transcripts show spatial clustering, co-localization with specific cell types, or enrichment in particular tissue regions, this information can inform iterative segmentation refinement by expanding boundaries in regions with high unassigned transcript density or adjusting segmentation algorithms to capture cellular protrusions.

Unlike imaging-based technologies, sST does not require segmentation refinement due to their spot-based nature, but high-resolution variants such as Visium HD can benefit from cell-aware aggregation. Tools like Bin2cell[36] address this by applying image-based segmentation, using StarDist or similar methods on H&E images, to group bins into putative single cells, enabling cell-level rather than bin-level analysis and improving compatibility with single-cell analytical frameworks.

As an alternative to simple cell segmentation, several methods have emerged that treat segmentation and cell type assignment as coupled problems rather than sequential steps, leveraging the observation that cell type identity constrains expected morphology. One of the most popular tools for this is JSTA which uses a scRNA-seq reference and initial watershed segmentation as input to a deep neural network to classify cells. This is followed by iterative reassignment by pixels based on local RNA densities until convergence is reached. These types of approaches can improve both segmentation and cell annotation accuracy when cell type groupings are PAGEXXXvailable, though often at increased computational cost.

After initial refinement, segmentation quality should be evaluated to determine if further refinement is necessary. This is typically through biological QC metrics in the absence of manual annotations: examining cell size distributions, doublet rates, unassigned transcript fractions, and whether known cell-type-specific markers localize appropriately. When ground truth cellular boundaries exist, overlap metrics like Intersection over Union provide quantitative accuracy measures, though manual segmentation remains labor-intensive and subjective for complex tissues

3.2 QC

QC involves filtering out low quality data points that may introduce technical noise, create spurious clusters, or obscure genuine biological signals. QC strategies for spatial omics largely build upon established single-cell RNA-seq workflows with some being adapted to accommodate spatial context and platform-specific characteristics.

Common QC procedures for sST include filtering cells or spots based on outlier total counts, number of unique transcripts, high percentage of mitochondrial genes, and number of cells per capture spot. These steps are routinely implemented using widely adopted analysis frameworks in both Python and R ecosystems such as Scanpy and Seurat. Importantly, Bhuva et al.[37] highlighted the critical impact of library size variation in sST data, demonstrating that differences in sequencing depth can substantially influence downstream analyses and interpretation. Their findings underscore the need for careful consideration of library size-related biases when performing QC and normalization in spatial omics studies.

For iST, QC requirements have some overlap with sST (e.g., filtering on total counts), but there are distinct differences. Primarily, image background denoising and cell segmentation refinement should be completed prior to biological QC, and mitochondrial gene QC is generally not required. Because of the absence of standardized metrics for evaluating iST data quality, Plummer et al.[38] proposed a set of quantitative metrics for assessing data quality. These include technical measurements such as transcripts per cell to evaluate sensitivity, as well as normalized transcripts per nucleus as a complementary indicator for cell segmentation. The authors released SpatialQM for metrics calculation, and SpatialTouchstone portable as a shared resource for the academic community to compare data quality across datasets.

When working with iSP, QC begins with raw image assessment and preprocessing, including evaluation of focus quality, signal-to-noise ratios across channels, and proper registration between imaging cycles. Background correction and denoising are essential as protein intensity measurements are particularly sensitive to autofluorescence, non-specific antibody binding, and cycle-to-cycle variation in staining efficiency. Channel-specific QC should be used to evaluate antibody performance by examining signal distributions and identifying channels with abnormally low signal or high background. After cell segmentation, cellular measurements undergo filtering based on total protein expression, number of detected markers, and morphological features.

While applying non-spatial QC methods is sufficient in many cases, leveraging spatial context during QC can enable distinction between biological heterogeneity and technical artifacts. An example of this would be fibrotic and necrotic areas, which are regions with naturally low counts. When such cells or spots form spatially coherent patterns aligned with known tissue structures, excluding them solely based on mitochondrial gene thresholds may inadvertently remove biologically meaningful signals. Incorporating spatial context by visualizing suspicious data points can help distinguish biologically relevant regions from technical artifacts.

3.3 Pre-processing

After filtering low quality spots/cells, normalization aims to correct for technical variation in order to enable valid biological comparisons. A majority of the original methods used for normalization in sST were taken directly from scRNA-seq analysis. The most common method is to perform some form of scaling on the data (e.g., CPM, library normalization) followed by log transformation. This is useful as a default approach due to its simplicity and interpretability; however, alternative methods, such as scTransform[39] which models the UMI counts for each gene using a generalized linear model, tend to see better performance[40].

While it is generally acceptable to apply the previously described scRNA-seq normalization methods to sequencing-based technologies, it is debatable whether this is the case for imaging-based technologies. This is because the total counts are determined by probe hybridization efficiency, not sequencing coverage. Additionally, the use of targeted gene panels may introduce bias towards specific gene sets or cell types, which cannot be accounted for with scRNA-seq normalization methods. In cases where cell volume data is available, Atta et al.[41] recommend utilizing that data for normalization with cell area serving as a proxy if volume is unavailable. This assumes that the transcription rate is constant between cells with transcript density being the value to normalize by. However, the efficacy of this method is largely impacted by segmentation accuracy.

When it comes to iSP, normalization refers to fluorescence intensity rather than count data. Common single-marker transformations include Z-score normalization, log or inverse hyperbolic sine transformation, and min-max scaling, each with distinct assumptions about data distribution and interpretation. Inverse hyperbolic sine transformation has gained favor in spatial proteomics due to its ability to handle zero values naturally while approximating log behavior for high intensities, making it robust across markers with different dynamic ranges[42].

Traditional normalization methods assume technical variation is independent of spatial location. However, sST data often exhibit spatial gradients in technical quality that are confounded with biological spatial patterns. Standard normalization may inadvertently remove genuine biological gradients while failing to correct spatially structured technical artifacts. Because of this, some researchers even recommend avoiding normalization before spatial domain identification if the method is not spatially-aware, as normalization can blur spatial boundaries and reduce the detectability of spatially organized gene expression programs[37]. Tools such as SpaNorm[43] address this issue by modeling spatial and non-spatial technical effects separately, using spatial autocorrelation to distinguish spatially smooth technical variation from sharp biological boundaries. However, this is more computationally expensive than non-spatial methods.

Normalization results can be evaluated qualitatively by visualizing the results via boxplots or histograms to verify the distributions are not skewed. After normalization, another common pre-processing step is dimensionality reduction. This serves multiple purposes in spatial omics analysis including denoising data through low-rank approximation (e.g. PCA), reducing computational burden for downstream analysis, and enabling two-dimensional visualization of cellular heterogeneity (e.g. UMAP, t-SNE).

Following these initial steps, the data is prepared for downstream biological analysis, including cell type identification and CCI inference.

3.5 Cell typing and spatial domain identification

Cell typing, also called cell type classification/annotation, is the process of converting quantitative gene/protein counts into biologically interpretable labels. The most popular methods across different technologies include clustering, cell state/phenotype scoring, reference-based mapping, and deep learning/machine learning tools. Other methods, such as deconvolution and segmentation-free analysis, are more specific to certain cases such as low-resolution sST data or imaging-based spatial data where accurate cell segmentation is difficult (Table 3).

Table 3. Summary of cell typing/subtyping computational tools, how they work, their inputs, outputs, and language they are implemented in (if applicable).
Tool/PipelineMethod SummaryInputOutputLanguage(s)
Clustering
BayesSpace[44]Full Bayesian stastical method that uses a low-dimensional representation of expression matrix to model spatial clusteringSpatial omics dataSpot and subspot level cluster labelsR
BASS[45]Uses Bayesian hierachical modeling framework for joint clustering and spatial domain detectionSpatial omics dataSingle-cell level cluster labels, spatial domain labels, cell type proprtionsR
Pixie[46]Extracts pixel-level features, unsupervised clustering to identify pixel-level phenotypes, maps clusters back to imageImaging-based proteomics dataPixel-level cluster labelsPython
SpaCell[47]Image feature extraction with CNN, K-means clustering on latent matrix representing image and gene-count dataSpatial transcriptomics data, histology imageSingle-cell/spot level cluster labelsPython
Cell State/Phenotype Scoring
FGSEA[48]Takes ranked list of genes and calculate enrichment score based on the position and frequency at which genes from a gene set appear in that listSpatial omics data,gene setsGene set enrichment scoresR
AUCell[49]Calaculates enrichment of gene sets as an AUC across all ranked genes in a cell/spotSpatial omics data,gene setsGene set activity scoresPython, R (implemented in SCENIC)
WSUM[50]Multiplies each target gene in a gene set by its associated weight from the input data which are then summed to get final enrichment scoreSpatial omics data,gene setsGene set enrichment scoresPython, R (implemented in decoupleR)
WMEAN[50]Similar to WSUM but divides the summed enrichment score by the sum of the absolute value of weightsSpatial omics data,gene setsGene set enrichment scoresPython, R (implemented in decoupleR)
M-scores[51]Compares the expression distributions between query and reference samples for given gene setsSpatial omics data,gene setsGene set dysregulation scoresMyPROSLE webtool
Reference-based Mapping
scType[52]Analyzes detected gene signature at each spot against maker gene database (scTypeDB) or reference and scores themSpatial transcriptomics data, scRNA-seq reference (optional)Single-cell/spot-level cell type labels basedPython, R
Spatial-ID[53]DNN trained on reference, GCN constructs spatial neighbor graph, autoencoders to encode gene expression patterns and embed spatial informationSpatial transcriptomics data, scRNA-seq referenceSingle-cell/spot-level cell type labelsPython
TopACT[54]Independently classifies cell type of each spot, uses dynamically scaled local neighborhoodImaging-based transcriptomics data, scRNA-seq referenceSpot-level cell type annotationsPython
SpatialScope[55]Deep generative model to learn distributions from scRNA-seq data which are then used to identify cell type labelsSpatial transcriptomics data, scRNA-seq referenceSpatial maps of cell types at single-cell resolutionPython
RedeHist[56]U-Net to extract histological features, DNN produces latent embeddings for nucleus mask, generates cell abundance matrixSpatial transcriptomics data, histology image, scRNA-seq referenceSingle-cell cell type labels, whole transcriptome expression profiles, and coordinatesPython
CytoSpace[57]Estimates cell type proportions and cells per spot, samples scRNA-seq data to match estimations, assigns single cells to spatial spots via shortest augmenting path optimizationSpatial transcriptomics data, scRNA-seq referenceSingle-cell/spot level cell type labelsPython
Tangram[58]Iterative learning of spatial alignment of sc/snRNA-seq dataSpatial transcriptomics data, histology image, sc/snRNA-seq referenceSingle-cell labels/deconvolutionPython
Deep Learning/Machine Learning
Novae[59]Self-supervised graph attention network that encodes local environments into spatial representationsSpatial transcriptomics data, histology image (optional)Single-cell/spot level cluster labelsPython
CellTune[60]Feature extraction followed by training two gradient-boosted tree models in parallel, human updated labels to iteratively improve modelImaging-based proteomics dataSingle-cell level cell type labels, cell gating, maker positivity predictionsDesktop Application
CELESTA[61]Identifies “anchor cells” via protein expression profile, classify “non-anchor” cells via both protein expression and known cell types within spatial neighborhood iteratively until convergenceImaging-based proteomics dataSingle-cell level cell type labelsR
CellSighter[62]Ensemble of CNN models performing multi-class classification, calculates probability of each cell belonging to a classImaging-based proteomics dataSingle-cell level cell type labelsPython
STARLING[63]Probabilistic machine learning model that accounts for segmentation errorsImaging-based proteomics dataSingle-cell level cell type labels, cluster labels, per-cell segmentation error probabilitiesPython
Deconvolution
CARD[64]Non-negative matrix factorization model with CAR modeling assumptionSequencing-based transcriptomics data, scRNA-seq referenceSpot cell type proportionsR
GIST[65]Bayesian probabilistic model using prior estimates of cell type proportions from paired tissue image to optimize estimates derived from spatial dataSequencing-based transcriptomics data, histology imageSpot cell type proportionsR
Spotiphy[66]Probabilistic generative modeling to estimate cell type proportions in capture and non-capture spotsSequencing-based transcriptomics data, histology image, scRNA-seq referenceSpot cell type proportions, inferred scRNA profiles, and pseudo single-cell resolution image with cell type labelsPython
Cell2Location[67]Bayesian model decomposes spatial expressed matrix into reference cell type signatures to estimate abundance of cell types at each locationSequencing-based transcriptomics data, scRNA-seq referenceSpot cell type proportionsPython
SpaDecon[68]Combine spatial and reference gene expression matrices, autoencoder identifies relevant features for cell types, infer cell type proportionsSequencing-based transcriptomics data, histology image (optional), scRNA-seq referenceSpot cell type proportionsPython
RCTD[69]Reference-based probabilistic model predicts cell types on pixels, predicts maximum-likelihood cell type proportionsSequencing-based transcriptomics data, scRNA-seq referenceSpot cell type proportionsR
Segmentation-Free Analysis
FICTURE[70]Multilayered Dirichlet model for stochastic variational inference of pixel-level spatial factorsSpatial transcriptomics data, scRNA-seq reference (optional)Pixel-level cell type labelsPython
SSAM[71]Gaussian KDE to get spatial mRNA density, identify cell type signatures from gene expression vector field, signatures mapped to vector field via Pearson’s correlationImaging-based transcriptomics data, scRNA-reference (optional)Pixel-level cell type labelsPython
Sainsc[72]KDE to model 2D gene expression, models cell type assignments using cosine similarity of gene expression with referenceNanometer resolution spatial tran scriptomics data, scRNA-seq reference (optional)Pixel-level cell type labelsPython/Rust, Julia

FGSEA: fast gene set enrichment analysis; AUC: area under the curve; WSUM: weighted sum; WMEAN: weighted mean; GCN: graph convolutional network; CAR: conditional autoregressive; KDE: kernel density estimation; SSAM: spot-based spatial cell-type analysis by multidimensional mRNA density estimation; RCTD: robust cell type decomposition; GIST: guiding-image spatial transcriptomics; CARD: conditional autoregressive-based deconvolution.

Clustering involves grouping data points that share intrinsic characteristics. Similar to scRNA-seq analysis, this means clustering by gene expression where cells with more similar expression patterns would be grouped together. This requires the manual assignment of cell type labels to the identified clusters based on the top marker genes. Graph-based clustering methods, such as Leiden[73] and Louvain[74] which are implemented in packages such as Scanpy, construct k-nearest neighbor graphs and identify communities, while other approaches use hierarchical clustering, k-means, or mixture models. For spatial data, it is possible to cluster both on gene expression and spatial location, typically assuming that closer spots/cells are more similar such as in methods like BayesSpace. Technically, single-cell clustering methods can still be used on spatial data, but it has been demonstrated that methods accounting for spatial coordinates tend to perform better at the cost of higher computational complexity[75]. The optimal method primarily depends on the input data (e.g., resolution, distinct spatial patterns, tissue type) and whether computational efficiency is a primary concern, with no single tool showing universally optimal performance. For example, Bayesian methods tend to see better performance when spots are organized in a regular grid or lattice structure[76].

Cell state or phenotype scoring provides an alternative to discrete cell type assignment by quantifying continuous biological states through marker gene expression. These include, but are not limited to, rank-based enrichment methods (e.g. FGSEA, AUCell), aggregation-based methods (e.g., WSUM, WMEAN), and expression distribution comparison methods (e.g., M-scores). These types of methods take gene sets, which are pre-defined groups of genes that participate in the same pathway or perform a common function, as part of their input and provide scores to quantify their activity for a given group of cells or capture spots. Gene sets can be derived from literature, pathway databases (e.g., MSigDB, KEGG), or computationally from reference datasets. In spatial contexts, scoring enables mapping of functional programs across tissue architecture, revealing spatial organization of processes like immune activation zones, hypoxic regions, or proliferative niches that may not correspond to distinct cell types. Among 18 different cell state scoring methods, Toro-Domínguez et al.[77] found that normalized WSUM, M-scores, and AUCell consistently performed well across multiple evaluation metrics.

Reference-based cell type annotation leverages existing single-cell or single-nucleus RNA-seq atlases to transfer cell type labels to spatial data. These methods typically compute marker gene signatures or use full expression profiles to match spatial observations to reference cell types. Reference-based annotation is particularly powerful when high-quality, tissue-matched references are available, enabling assignment of detailed cell states that would be difficult to resolve through de novo clustering alone. Performance of these methods depends critically on reference quality, biological concordance between reference and spatial datasets, and the presence of spatial cell types in the reference. Novel cell states or spatially restricted populations absent from the reference will likely be misannotated or forced into inappropriate categories.

In addition to traditional approaches, unique machine learning and deep learning techniques have increasingly become popular for cell type labeling. Graph-based models and neural networks in particular have gained traction due to their ability to represent datasets with complex, non-linear relationships. While these methods are powerful, they are often more computationally expensive, either requiring access to GPUs and/or having runtimes that scale exponentially with dataset size.

In the case of low-resolution sST, it is sometimes necessary to perform spot deconvolution to determine the proportions of cell types within a spot and, in some instances, their locations within a spot. Because spot-based sST is the oldest spatial omics technology commercially available, there are many deconvolution tools available, and their performances can vary depending on a variety of factors including the mRNA capture mechanism and detection mechanisms used[78]. Li et al.[79] note that reference-based methods tend to be more accurate and robust compared to non-reference-based methods with CARD, Cell2location, Tangram, and RCTD having the best overall performance.

When accurate cell segmentation in an imaging-based spatial dataset is challenging, such as in densely-packed tissues, segmentation-free methods can provide an alternative method of mapping cell type spatial patterns. These approaches bypass explicit cell boundary definition and instead analyze spatial expression patterns directly. Methods differ in their units of analysis: FICTURE and SSAM operate on hexagonal or square spatial bins, learning latent cell type representations from transcript spatial distributions without requiring cell assignment. Sainsc uses a probabilistic framework to infer cell type spatial distributions from transcript point patterns. Some methods (e.g., Baysor) offer hybrid approaches, optionally using segmentation when available or operating in segmentation-free mode when boundaries are ambiguous.

When evaluating the accuracy of cell type labels, it is best to have ground truth labels to compare them to. In cases where marker genes were not used for labeling, these can be compared to existing literature to verify the accuracy of cell type labels. Calculating the specificity of marker genes to a certain cell type group by using spatial autocorrelation metrics, such as Moran’s I or Geary’s C, can also help determine if the identified groupings are truly distinct. After mapping cell types in the tissue, it is then possible to identify distinct niches where these cells tend to localize and investigate how they differ via methods such as differential expression analysis. These spatial domains can then serve as input for other downstream analysis tasks such as neighborhood analysis and CCI modeling.

3.6 Neighborhood analysis and CCI modeling

CCIs orchestrate fundamental biological processes ranging from development and tissue homeostasis to immune responses and disease progression. While scRNA-seq data has been used to infer potential cellular communication networks based on ligand and receptor gene expression, these predictions lack spatial information, which can help inform which interactions are more likely to occur based on physical proximity. Spatial omics technologies address this limitation by enabling analysis of both molecular profiles and spatial organization, dramatically improving the ability of researchers to map functional interaction networks within native tissue architecture such as the tumor microenvironment[5].

Neighborhood analysis precedes CCI modeling by identifying which cell types are spatially organized in ways that enable interaction. These methods quantify whether cell populations exhibit spatial clustering, avoidance, or mixing patterns that deviate from what would be expected by chance. One of the most widely used approaches is neighborhood enrichment analysis, which quantifies spatial co-localization between cell type pairs by comparing observed versus expected frequencies of neighboring relationships. Implementations in packages like Squidpy[80] construct spatial graphs connecting cells within a defined radius, then calculate enrichment scores indicating whether certain cell type pairs appear as neighbors more or less frequently than random permutations would predict. Typical approaches either use biologically motivated distances or test multiple radii to identify distance-dependent relationships. Spatial co-occurrence analysis extends this by examining how cell type relationships change across spatial scales. Rather than a single distance threshold, co-occurrence functions quantify whether cell type pairs are enriched or depleted at increasing distances, revealing multi-scale spatial organization.

Following identification of proximally organized cell populations, CCI modeling aims to infer specific molecular signaling events, typically ligand-receptor pairs, mediating any possible interactions. Ligand-receptor database approaches form the foundation of most CCI methods (Table 4). These tools query curated databases of known receptor-ligand pairs against expression profiles to identify cell type pairs with complementary expression: one cell type expresses a ligand while a neighboring cell type expresses its cognate receptor. Statistical significance is assessed by comparing observed ligand-receptor co-expression to null distributions generated by permuting cell type labels while preserving spatial structure, or by randomizing spatial locations while maintaining cell type identities. However, database-driven approaches face several critical limitations. Different ligand-receptor databases vary substantially in coverage and annotation quality, with one recent analysis identifying less than 50% overlap in ligand-receptor pairs between major databases[90]. One reason for this is directional ambiguity: since many receptors also function as ligands, and most databases do not capture bidirectional signaling or complex multi-subunit receptors. Additionally, expression does not necessarily equate to functional interaction. Co-expression of ligand-receptor pairs is necessary, but post-translational modifications, receptor trafficking, spatial barriers, and regulatory context all influence whether predicted interactions occur functionally. Lastly, cross-sample comparisons of CCI predictions are sensitive to normalization choices and library size variation, potentially creating artificial differences in interaction strength.

Table 4. Summary of CCI/neighborhood analysis tools, how they work, inputs, outputs, and the languages they are implemented in.
Tool/PipelineMethod SummaryInputOutputLanguage
Neighborhood Analysis
SPACEc[81]Generates vectors representing cell counts for each window of nearest neighbors then clusters them to identify commonly composed neighborhoodsImaging-based proteomics dataCellular neighborhoodsPython
Kandinsky[82]Infers spot/cell neighborhoods using KNN, centroid distance, Delaunay triangulation, queen contiguity, and/or membrane distance; uses neighborhoods for clustering, calculating co-localization, and detecting hot/cold expression areasSpatial omics dataCell/spot neighborhoods, groupings, co-localization Z-scores, hot/cold areasR
BANKSY[83]Uses pair of spatial kernels to encode transcriptomics texture of microenvironment around each cell, augments features of each cellSpatial omics dataNeighbor-augmented expression matrixPython, R
CCI Modeling
COMMOT[84]Collective optimal transport to infer cell-cell communicationSpatial transcriptomics dataCell-cell communication networkPython
SpaOTsc[85]Structured optimal transport of signal senders to target signal receivers to obtain cell-cell communicationsSpatial transcriptomics data, scRNA-seq reference (optional)Mapping between spatial and scRNA-seq data, spatial subclustering, cell-cell communications, spatial distance for intercellular signaling, spatial map of intercellular gene-gene regulatory information flowPython
SpaTalk[86]Graph network and knowledge graph to model and score ligand-receptor-target signaling networkSpatial transcriptomics data, scRNA-seq referenceCell type decomposition matrix, cell-cell communication and ligand-receptor-target networksR
CellPhoneDB V3[87]Public repository of ligands, receptors, and their interactions which are used to assess cellular crosstalkSpatial transcriptomics data, scRNA-seq referenceRanked cellular interactionsPython
HoloNet[88]Models cell-cell communication events as multi-view network, attention-based graph learning model predicts target gene expression, decode functional communication eventsSpatial transcriptomics dataCell-cell communication events, functional communication eventsPython
stLearn[89]Spatially-constrained two-level permutation analysis to compute ligand-receptor scoresSpatial transcriptomics data, histology image (optional)Ligand-receptor scoresPython
Neighborhood Analysis and CCI Modeling
SquidpyModels data as spatial graph with cells/spots as nodes and neighborhood relations as edges, can perform neighborhood enrichment test/ ligand-receptor interaction analysisSpatial omics dataSpatial neighborhoods, ligand-receptor interactionsPython

CCI: cell-cell interaction; KNN: k-nearest neighbors.

To address these limitations, spatial methods improve upon the accuracy of predictions made by database approaches by restricting analyses to cell pairs within defined interaction distances or by weighting interactions by spatial distance. COMMOT, for example, models ligand-receptor signaling as a spatial flow problem, using optimal transport theory to infer the ‘communication intensity’ between cell locations based on both expression levels and spatial proximity, capturing directional communication patterns and spatial gradients of signaling activity. SPACEc uses spatial correlation between ligand expression in sender cells and receptor expression in receiver cells across tissue regions to identify spatially coordinated signaling.

While CCI inference is a critical area of study, it suffers from a lack of standardized methods of assessment and validation, with researchers typically cross-referencing their predictions with existing literature or performing experimental validation[91]. As such, it is difficult to compare the performance of existing tools, with benchmarking studies providing varying results for the same tools depending on their metrics of evaluation and input databases. In general, it is recommended to subject the results from any of these tools to multiple methods of validation to best ensure the predictions are accurate.

3.7 Multi sample and multimodal integration

The workflow previously described applies to single-slice data; however, it can be adapted to account for multi-slice data with additional steps. Spatial omics studies increasingly incorporate multiple tissue samples, biological replicates, or patient cohorts to achieve statistical power for identifying reproducible biological patterns, assess inter-individual variation, and enable robust biomarker discovery. However, as in single-cell analysis, technical variation between samples arising from batch-specific library preparation, sequencing runs, reagent lots, or operator differences can obscure biological signals if not properly corrected.

There are many existing batch correction methods, many coming from single-cell analysis pipelines. For instance, Seurat has its in-house Canonical Correlation Analysis function that captures common variation features between different batches by finding Mutual Nearest Neighbors anchors. Harmony[92] performs fuzzy clustering in the PCA space with its low algorithmic complexity, aiming to balance computational efficiency and integration performance. For deep-learning based approaches, scVI[93] utilizes a variational autoencoder to model raw counts as a zero-inflated negative binomial distribution, and employs stochastic gradient descent for model training alongside GPU acceleration support. In their review of batch correction and integration methods, Ludington et al.[94] compared 11 tools and found that probabilistic methods, such as scVI, excel in removing unwanted technical variation while preserving meaningful biological structure, with Harmony also performing competitively with them.

In addition to batch correction, slice alignment is also necessary to ensure that certain features occupy the same coordinate system (Table 5). PASTE is one of the most well-known alignment methods. It computes pairwise alignment between slices using optimal transport to account for both transcriptional similarity and physical distance between spots. However, many early alignment methods like this assume full overlap between sections, which is often not the case. To address this issue, there has been the development of PASTE2, which allows for partial-overlap between slices, and STalign, which utilizes Large Deformation Diffeomorphic Metric Mapping to accommodate non-linear distortions. If multiple z-sections or contiguous blocks were imaged, some alignment tools (e.g., PASTE2) allow for the reconstruction of the data into a navigable 3D model to analyze tissue structure and gradients in three dimensions.

Table 5. Summary of integration tools for spatial omics data, how they work, inputs, outputs, and languages they are implemented in.
Tool/PipelineMethod SummaryInputOutputLanguage
Multi-slice Alignment/Integration
PASTE[95]Pairwise alignment via optimal transport that models both transcriptional similarity and physical distance/Multiple slice integration by combining fused Gromov-Wasserstein barycenter with NMFPair of spatial transcriptomics slices (assumes full overlap)Pairwise mappings between slices/NMF decomposition of center slice gene expression, mapping between center slice and input slicesPython
PASTE2[96]Extension of PASTE allowing for partial alignment, can utilize histology images to aid alignment by identifying spots with similar histologyPair of spatial transcriptomics slices, histology images (optional)Partial alignment matrix, overlap percentage, stacked slices for 3D reconstructionPython
JADE[97]Encoders extract low-dimensional embeddings, graph attention module to get embedding-space alignment, roundtrip learning scheme to refine embeddings and alignments alternately within training iterationsPair of spatial transcriptomics slicesProbabilistic alignment matrix, embedding representation of each spotPython
SpateoUses probablistic model for aligning slices to create aligned 3D point cloudsSpatial transcirptomics slices3D reconstruction of slicesPython
Multi-dataset Alignment/Integration
STalign[98]Rasterize source and target coordinates into images, solves mapping between images, applies mapping to sourcePair of spatial transcriptomics datasets, histology image for single-cell and spot-resolution alignmentSource aligned coordinatesPython
SLAT[99]Cross-dataset SVD to project omics profiles into shared low-dimensional space, GCN to encode local and global information, align graphsSpatial omics datasetsCell to cell/spot to spot matching, similarity scores, cell type levelsPython
spatiAlign[100]Autoencoder generates low-dimensional gene representations, optimized by self-supervised contrastive learning, reconstructs original inputSpatial transcriptomics datasetsLearned lower-dimensional representations, reconstructed gene expression matrixPython
MIIT[101]Spatial data are processed to reference matrices, registered to stained images, source section is registered to image space of target section, data from source are fused to match spatial organization of targetSpatial omics datasets, histology imagesIntegrated spatial omics data from sourcePython
STAligner[102]Graph attention autoencoder neural network to extract spatially aware embedding, constructs spot triples, iterative optimizationSpatial transcriptomics dataset(s)Batch-corrected spatial embeddingsPython
SPACELUses GCN and adversarial learning algorithm to find spatial domains that are spatially and transcriptomically coherent across slicesSpatial transcriptomics dataset(s)Single slice cell type deconvolution, domain identification across slices, 3D reconstruction of slices (for consecutive slices)Python
STAIRUses hetergeneous graph attention network to learn spatial features and get consistent spatial domians across slicesSpatial transcirptomics dataset(s)Aligned spatial embeddings, de novo 3D reconstruction of slicesPython
CASTUses deep GNN and physical alignment for single-cell level alignmentSpatial transcriptomics dataset(s)Common features across slices, alignment of pairs of slices, projection of one slice to anotherPython
Spatial Multi-omics Integration
SANTO[103]Identifies overlap between slices, dynamic graph CNN to extract local and global embeddings for spatial coordinates and omics feature expression, generate soft mapping to generate full alignment/stitchingPair of spatial omics slices (includes spatial epigenomics)Transformed spatial coordinates of source slice, coarse and fine rotation and translationPython
stLVG[104]Vector-guided graph model with location, direction, and angle-based edge weights learns cross-slice features via adversarial module, learns cell representations via multi-view contrastive learning moduleSpatial omics datasets (including epigenomics)Learned cell embeddingsPython
SpaMV[105]Two GAT encoders per omics datasets to extract shared and private information, infers shared representationsSpatial omics datasets (including epigenomics and metabolomics)Inferred latent variablesPython
Multi-modal Integration
SIMO[106]KNN to construct spatial graph and modality map, fused Gromov-Wasserstein optimal transport to get mapping between cells and spots, label transfer of non-transcriptomic data via Unbalanced Optimal TransportSpatial transcriptomics data, single-cell dataSingle-cell data mapped to spatialPython
SpatialEx+[107]Generates missing spatial omics data for H&E images (SpatialEx), omics cycle modules to establish omics-omics associationsAdjacent spatial omics slices (different omics), corresponding H&E imagesSpatial omics profiles aligned with H&EPython
MISO[108]Extracts low-dimensional embeddings from each modality, calculate outer products of modality-specific embeddings to get interaction feature vectors which serve as input for k-means clusteringSpatial omics data, histology imagesClustered embeddingsPython
MaxFuse[109]Fuzzy smoothed embedding followed by iterative co-embedding, data smoothing, and cell matchingSpatial omics data, single-cell dataJoint embedding coordinatesPython

NMF: non-negative matrix factorization; JADE: joint alignment and deep embedding; SVD: singular value decompositions; GCN: graph convolutional network; SLAT: spatial-linked alignment tool; MIIT: multi-omics imaging integration toolset; GNN: graph neural network; CNN: convolutional neural network; GAT: graph attentions network; SIMO: spatial integration of multi-omics.

Beyond alignment between slices generated by the same spatial technology, alignment and integration between slices across platforms has become increasingly popular given that no single spatial omics platform simultaneously achieves high spatial resolution, comprehensive molecular coverage, and multimodal measurement. For instance, Pitino et al.[110] applied Xenium, CosMx, Akoya, and integrated them with H&E to achieve multi-modality analysis. Alignment between Xenium and Visium typically relies on registered H&E images with anatomical landmarks, to reduce manual intervention. Landmark-free alignment frameworks are also emerging for automated cross-platform registration. However, as these platforms often require serial sections or distinct samples, spatial alignment remains a significant challenge.

A comprehensive review[111] benchmarked 24 multi-slice ST alignment and integration tools including 10 bayesian inference statistically mapping tools, 10 graph-based tools and 4 image processing and registration tools. As a result, the authors recommended autoencoder-based tools (e.g., STalign, SLAT, SpatiAlign) among all compared methods. SLAT in particular was highlighted for its effectiveness in cross-dataset alignment in seqFISH and Stereo-seq data, and cross-platform alignment between Visium and Xenium, as well as its potential to extend to 3D tissue reconstruction. Another review by Yan et al.[112] provides helpful guidelines for selecting which alignment method is best depending on the user’s input data and priorities for their analysis. They recommend utilizing SPACEL if annotated region labels are available. If they are not, they suggest PASTE2, Spateo, or STAIR depending on the platform used to generate the data. In the case where those methods fail, they recommend CAST, SLAT, STalign, or STAligner.

Alignment and integration methods can be evaluated qualitatively by visually inspecting the overlapping aligned slices to determine how well regions of interest and tissue boundaries align. Alignment can also be assessed quantitatively by calculating the expression similarity of two overlapping regions, with higher similarity meaning better alignment, or by selecting regional “landmarks” in the slices that should align and calculating the degree of overlap. In many cases, spatial omics data also benefits from integration with other non-spatial data modalities, such as using single-cell reference data for cell type labeling. SIMO allows for the integration of scRNA-seq in addition to other single-cell modalities (e.g., scATAC-seq) with sST. Another popular type of integration is molecular data with H&E or immunofluorescence histology, which allows researchers to bridge the gap between molecular phenotypes and classic morphological features. By employing spatial overlays, these multi-modal visualizations provide intuitive biological context, empowering the identification of tissue niches where gene expression correlates precisely with pathological structures.

4. Current Challenges in Spatial Omics Analysis

Despite remarkable technological and computational advances[4], spatial omics faces several persistent challenges that limit accessibility, reproducibility, and biological insight. One of the most persistent issues in the field is accurate cell segmentation in complex tissues. While deep-learning approaches have dramatically improved over classical ones, such as watershed segmentation, they still fail systematically in several biological contexts. Dense tissues where cell boundaries touch or overlap challenge segmentation algorithms that assume clear inter-cellular space. Additionally, morphologically complex cells (e.g., neurons) violate the compact, convex shape assumptions of many segmentation models. Lastly, weak or absent membrane staining, common in poorly fixed tissue or regions with particularly dense extracellular matrices, provides insufficient boundary information for accurate segmentation.

Platform heterogeneity and lack of standardization fragment the spatial omics ecosystem, impeding reproducibility, cross-study comparison, and method generalization[5]. Platforms differ across multiple dimensions including resolution, throughput, sensitivity, coordinate systems, and data formats. Because of this, computational tools can perform inconsistently across platforms. For example, normalization methods appropriate for sequencing-based platforms may be invalid for targeted imaging panels, which may be biased towards the characterization of specific cell types[102,113]. Cross-platform benchmarking is rare, leaving researchers uncertain whether methods validated on one platform will generalize to their data[38]. The lack of community standards compounds these issues. Initiatives to address this issue, like SpatialData, have emerged but are not yet widely adopted.

Another major issue that needs to be addressed is scalability, an issue shared with single-cell studies. Many spatial omics datasets can take up several terabytes of data, particularly high-resolution methods and ones that generate high-resolution images. These images can take up gigabytes of data, and processing these images, including denoising, segmentation, and normalization, can take hours, and often requires access to a GPU. Similarly, spatial statistics often scale quadratically or worse with cell numbers, becoming prohibitive for datasets with hundreds of thousands to millions of cells.

Finally, the scarcity of well-annotated spatial reference atlases limits supervised analysis approaches that have proven powerful in scRNA-seq. While projects like the Human Cell Atlas, Human Protein Atlas, and Tabula Sapiens provide comprehensive single-cell references covering diverse tissues and cell types, spatial references remain fragmented, tissue-specific, and rarely standardized. Existing spatial atlases cover only a subset of tissues and often represent single developmental stages or healthy tissue, lacking disease contexts, species diversity, or technical replicates. This gap forces researchers to rely on transferring labels from scRNA-seq references to spatial data, which can result in inaccurate labeling depending on the reference quality as previously mentioned. Moreover, the spatial context itself provides defining information ignored by reference-based label transfer. A cell expressing certain markers might represent different biological entities depending on its spatial location, and purely transcription-based label transfer cannot capture these spatial identity dimensions.

5. Conclusion

Spatial omics technologies have rapidly expanded our ability to study tissues in their native architectural context, bridging the gap between molecular profiling and spatial organization. From early sequencing-based platforms to increasingly sophisticated imaging-based transcriptomic and proteomic methods, each technological generation has deepened biological insight while introducing new computational challenges. As datasets grow in resolution, multiplexing, and physical scale, analytical frameworks must evolve beyond adaptations of single-cell workflows toward methods that explicitly model spatial structure, tissue morphology, and multimodal complexity[4]. Given the heterogeneity of platform data formats, a universal framework is highly desirable. While Seurat and Anndata can store image information, SpatialData provides native support for raster images and greater flexibility across multiple platforms, making it a promising candidate for a unified data representation in sST. In this review, we have summarized file output formats of commonly used platforms, processing data storage structures, and downstream computational analytical tools.

Looking ahead, the next phase of spatial omics will require methodological innovations that unify molecules, cells, tissue architecture, and morphology into coherent analytical frameworks. Equally important will be efforts to standardize file formats, establish benchmarking datasets, and develop reproducible workflows to ensure that analyses remain transparent and comparable across studies. As spatial profiling of tissues continues to expand, other assays outside of transcriptomics and proteomics have moved into the spatial dimension such as epigenomics and metabolomics. Currently, there are significantly less analysis tools developed with specifically these modalities in mind. Instead, they are often integrated with sST and proteomics as shown in Table 5. As these modalities become more common, there will have to be tools and documented methods for processing and normalizing these datasets. With the continued progress of spatial omics towards higher resolutions, larger sample sizes, and richer multimodal measurements, computational methods will play an increasingly central role in unlocking the biological insights embedded within these data. By addressing current analytical gaps and building frameworks designed for the spatial dimension from the ground up, the field is poised to deliver transformative advances in tissue biology, disease mechanisms, and translational medicine.

Authors contribution

Alexander M, Liu Y: Writing-original draft, visualization.

Dezem FS: Writing-review & editing.

Chasteen H: Visualization.

Plummer J: Conceptualization, writing-review & editing.

Conflicts of interest

The authors declare no conflicts of interest.

Ethical approval

Not applicable.

Not applicable.

Not applicable.

Availability of data and materials

Not applicable.

Funding

The work was supported by the Ovarian Cancer Research Alliance (Grant No. ECIG-2022–3-1143) and the Chan Zuckerberg Foundation (Grant Nos. OS00001235 and 2024–345901), all awarded to Jasmine Plummer.

Copyright

© The Author(s) 2026.

References

  • 1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621-628.
    [DOI] [PubMed]
  • 2. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377-382.
    [DOI] [PubMed]
  • 3. Andrews TS, Kiselev VY, McCarthy D, Hemberg M. Tutorial: Guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc. 2021;16(1):1-9.
    [DOI]
  • 4. Dezem FS, Arjumand W, DuBose H, Morosini NS, Plummer J. Spatially resolved single-cell omics: Methods, challenges, and future perspectives. Annu Rev Biomed Data Sci. 2024;7(1):131-153.
    [DOI] [PubMed]
  • 5. See JE, Barlow S, Arjumand W, DuBose H, Segato Dezem F, Plummer J. Spatial omics: Applications and utility in profiling the tumor microenvironment. Cancer Metastasis Rev. 2025;44(4):87.
    [DOI] [PubMed] [PMC]
  • 6. Method of the year 2024: Spatial proteomics. Nat Methods. 2024;21(12):2195-2196.
    [DOI] [PubMed]
  • 7. Guo P, Mao L, Chen Y, Lee CN, Cardilla A, Li M, et al. Multiplexed spatial mapping of chromatin features, transcriptome and proteins in tissues. Nat Methods. 2025;22(3):520-529.
    [DOI] [PubMed] [PMC]
  • 8. Sui X, Lo JA, Luo S, He Y, Tang Z, Lin Z, et al. Scalable spatial single-cell transcriptomics and translatomics in 3D thick tissue blocks. Nat Methods. 2025;22:2574-2584.
    [DOI]
  • 9. Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185(10):1777-1792.
    [DOI]
  • 10. Blampey Q, Mulder K, Gardet M, Christodoulidis S, Dutertre CA, André F, et al. Sopa: A technology-invariant pipeline for analyses of image-based spatial omics. Nat Commun. 2024;15(1):4981.
    [DOI] [PubMed] [PMC]
  • 11. Degatano K, Awdeh A, Cox RS III, Dingman W, Grant G, Khajouei F, et al. Warp analysis research pipelines: Cloud-optimized workflows for biological data processing and reproducible analysis. Bioinformatics. 2025;41(10):btaf494.
    [DOI]
  • 12. Chu YH, Hardin H, Zhang R, Guo Z, Lloyd RV. In situ hybridization: Introduction to techniques, applications and pitfalls in the performance and interpretation of assays. Semin Diagn Pathol. 2019;36(5):336-341.
    [DOI]
  • 13. Liu Y, Dai Y, Wang L. Spatial omics at the forefront: Emerging technologies, analytical innovations, and clinical applications. Cancer Cell. 2026;44(1):24-49.
    [DOI]
  • 14. Righelli D, Weber LM, Crowell HL, Pardo B, Collado-Torres L, Ghazanfar S, et al. SpatialExperiment: Infrastructure for spatially-resolved transcriptomics data in R using Bioconductor. Bioinformatics. 2022;38(11):3128-3131.
    [DOI]
  • 15. Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, et al. Orchestrating single-cell analysis with bioconductor. Nat Meth. 2020;17(2):137-145.
    [DOI]
  • 16. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495-502.
    [DOI]
  • 17. Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. Anndata: Access and store annotated datamatrices. J Open Source Softw. 2024;9(101):4371.
    [DOI]
  • 18. Wolf FA, Angerer P, Theis FJ. SCANPY large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15.
    [DOI]
  • 19. Moses L, Einarsson PH, Jackson K, Luebbert L, Booeshaghi AS, Antonsson S, et al. Voyager: Exploratory single-cell genomics data analysis with geospatial statistics. BioRxiv [Preprint]. 2023.
    [DOI]
  • 20. Pebesma E. Simple features for R: Standardized support for spatial vector data. R J. 2018;10(1):439.
    [DOI]
  • 21. Marconato L, Palla G, Yamauchi KA, Virshup I, Heidari E, Treis T, et al. SpatialData: An open and universal data framework for spatial omics. Nat Meth. 2025;22(1):58-62.
    [DOI]
  • 22. Mitchel J, Gao T, Cole E, Petukhov V, Kharchenko PV. Impact of segmentation errors in analysis of spatial transcriptomics data. BioRxiv [Preprint]. 2025.
    [DOI]
  • 23. Wu L, Beechem JM, Danaher P. Using transcripts to refine image based cell segmentation with FastReseg. Sci Rep. 2025;15:30508.
    [DOI]
  • 24. Stringer C, Pachitariu M. Cellpose3: One-click image restoration for improved cellular segmentation. Nat Meth. 2025;22(3):592-599.
    [DOI]
  • 25. Schmidt U, Weigert M, Broaddus C, Myers G. Cell detection with star-convex polygons. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Medical image computing and computer assisted intervention–MICCAI 2018; 2018 Sep 16-20; Granada, Spain. Cham: Springer; 2018. p. 265-273.
    [DOI]
  • 26. Heidari E, Moorman A, Unyi D, Pasnuri N, Rukhovich G, Calafato D, et al. Segger: Fast and accurate cell segmentation of imaging-based spatial transcriptomics data. BioRxiv [Preprint]. 2025.
    [DOI]
  • 27. Fu X, Lin Y, Lin DM, Mechtersheimer D, Wang C, Ameen F, et al. BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat Commun. 2024;15(1):509.
    [DOI] [PubMed] [PMC]
  • 28. Petukhov V, Xu RJ, Soldatov RA, Cadinu P, Khodosevich K, Moffitt JR, et al. Cell segmentation in imaging-based spatial transcriptomics. Nat Biotechnol. 2022;40(3):345-354.
    [DOI] [PubMed]
  • 29. Jones DC, Elz AE, Hadadianpour A, Ryu H, Glass DR, Newell EW. Cell simulation as cell segmentation. Nat Meth. 2025;22(6):1331-1342.
    [DOI]
  • 30. Dayao MT, Brusko M, Wasserfall C, Bar-Joseph Z. Membrane marker selection for segmenting single cell spatial proteomics data. Nat Commun. 2022;13(1):1999.
    [DOI] [PubMed] [PMC]
  • 31. Littman R, Hemminger Z, Foreman R, Arneson D, Zhang G, Gómez-Pinilla F, et al. Joint cell segmentation and cell type annotation for spatial transcriptomics. Mol Syst Biol. 2021;17(6):e10108.
    [DOI] [PubMed] [PMC]
  • 32. He Y, Tang X, Huang J, Ren J, Zhou H, Chen K, et al. ClusterMap for multi-scale clustering analysis of spatial gene expression. Nat Commun. 2021;12:5909.
    [DOI]
  • 33. Pang M, Roy TK, Wu X, Tan K. CelloType: A unified model for segmentation and classification of tissue images. Nat Methods. 2025;22(2):348-357.
    [DOI] [PubMed] [PMC]
  • 34. Jin K, Zhang Z, Zhang K, Viggiani F, Callahan C, Tang J, et al. Bering: Joint cell segmentation and annotation for spatial transcriptomics with transferred graph embeddings. Nat Commun. 2025;16(1):6618.
    [DOI] [PubMed] [PMC]
  • 35. Salas SM, Dammann M, Rubens RK, Drummer F, Halle L, Becker S, et al. Exploration of RNA outside segmented cells in spatial transcriptomics reveals extrasomatic RNA organization. BioRxiv [Preprint]. 2025.
    [DOI]
  • 36. Polański K, Bartolomé-Casado R, Sarropoulos I, Xu C, England N, Jahnsen FL, et al. Bin2cell reconstructs cells from high resolution Visium HD data. Bioinformatics. 2024;40(9):btae546.
    [DOI] [PubMed] [PMC]
  • 37. Bhuva DD, Tan CW, Salim A, Marceaux C, Pickering MA, Chen J, et al. Library size confounds biology in spatial transcriptomics data. Genome Biol. 2024;25(1):99.
    [DOI] [PubMed] [PMC]
  • 38. Plummer JT, Dezem FS, Cook DP, Park J, Zhang L, Liu Y, et al. Standardized metrics for assessment and reproducibility of imaging-based spatial transcriptomics datasets. Nat Biotechnol. 2025;1-13.
    [DOI]
  • 39. Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019;20(1):296.
    [DOI] [PubMed] [PMC]
  • 40. Chen W, Zhao Y, Chen X, Yang Z, Xu X, Bi Y, et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat Biotechnol. 2021;39(9):1103-1114.
    [DOI] [PubMed] [PMC]
  • 41. Atta L, Clifton K, Anant M, Aihara G, Fan J. Gene count normalization in single-cell imaging-based spatially resolved transcriptomics. Genome Biol. 2024;25(1):153.
    [DOI]
  • 42. Li W, Mao L, Liu Y, Peng F, Sachs N, Wu W, et al. Toward computationally complete spatial omics. BioRxiv [Preprint]. 2026.
    [DOI]
  • 43. Salim A, Bhuva DD, Chen C, Tan CW, Yang P, Davis MJ, et al. SpaNorm: Spatially-aware normalization for spatial transcriptomics data. Genome Biol. 2025;26(1):109.
    [DOI] [PubMed] [PMC]
  • 44. Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39(11):1375-1384.
    [DOI] [PubMed] [PMC]
  • 45. Li Z, Zhou X. BASS: Multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022;23(1):168.
    [DOI] [PubMed] [PMC]
  • 46. Liu CC, Greenwald NF, Kong A, McCaffrey EF, Leow KX, Mrdjen D, et al. Robust phenotyping of highly multiplexed tissue imaging data using pixel-level clustering. Nat Commun. 2023;14(1):4618.
    [DOI] [PubMed] [PMC]
  • 47. Tan X, Su A, Tran M, Nguyen Q. SpaCell: Integrating tissue morphology and spatial gene expression to predict disease cells. Bioinformatics. 2020;36(7):2293-2294.
    [DOI] [PubMed]
  • 48. Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. BioRxiv [Preprint]. 2016.
    [DOI]
  • 49. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, et al. SCENIC: Single-cell regulatory network inference and clustering. Nat Meth. 2017;14(11):1083-1086.
    [DOI]
  • 50. Badia-I-Mompel P, Vélez Santiago J, Braunger J, Geiss C, Dimitrov D, Müller-Dott S, et al. decoupleR: Ensemble of computational methods to infer biological activities from omics data. Bioinform Adv. 2022;2(1):vbac016.
    [DOI] [PubMed] [PMC]
  • 51. Toro-Domínguez D, Martorell-Marugán J, Martinez-Bueno M, López-Domínguez R, Carnero-Montoro E, Barturen G, et al. Scoring personalized molecular portraits identify Systemic Lupus Erythematosus subtypes and predict individualized drug responses, symptomatology and disease progression. Brief Bioinform. 2022;23(5):bbac332.
    [DOI]
  • 52. Nader K, Tasci M, Ianevski A, Erickson A, Verschuren EW, Aittokallio T, et al. ScType enables fast and accurate cell type identification from spatial transcriptomics data. Bioinformatics. 2024;40(7):btae426.
    [DOI] [PubMed] [PMC]
  • 53. Shen R, Liu L, Wu Z, Zhang Y, Yuan Z, Guo J, et al. Spatial-ID: A cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun. 2022;13(1):7640.
    [DOI] [PubMed] [PMC]
  • 54. Benjamin K, Bhandari A, Kepple JD, Qi R, Shang Z, Xing Y, et al. Multiscale topology classifies cells in subcellular spatial transcriptomics. Nature. 2024;630(8018):943-949.
    [DOI] [PubMed] [PMC]
  • 55. Wan X, Xiao J, Tam SST, Cai M, Sugimura R, Wang Y, et al. Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nat Commun. 2023;14(1):7848.
    [DOI] [PubMed] [PMC]
  • 56. Zhong Y, Zhang J, Ren X. Spatial transcriptomics prediction from histology images at single-cell resolution using RedeHist. BioRxiv [Preprint]. 2024.
    [DOI]
  • 57. Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol. 2023;41(11):1543-1548.
    [DOI] [PubMed] [PMC]
  • 58. Biancalani T, Scalia G, Buffoni L, Avasthi R, Lu Z, Sanger A, et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021;18(11):1352-1362.
    [DOI] [PubMed] [PMC]
  • 59. Blampey Q, Benkirane H, Bercovici N, Mulder K, Gessain G, Ginhoux F, et al. Novae: A graph-based foundation model for spatial transcriptomics data. Nat Meth. 2025;22(12):2539-2550.
    [DOI]
  • 60. Bussi Y, Shainshein D, Ovits E, Posner S, Azulay N, Maimon N, et al. CellTune: An integrative software for accurate cell classification in spatial proteomics. BioRxiv [Preprint]. 2025.
    [DOI]
  • 61. Zhang W, Li I, Reticker-Flynn NE, Good Z, Chang S, Samusik N, et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat Methods. 2022;19(6):759-769.
    [DOI] [PubMed] [PMC]
  • 62. Amitay Y, Bussi Y, Feinstein B, Bagon S, Milo I, Keren L. CellSighter: A neural network to classify cells in highly multiplexed images. Nat Commun. 2023;14:4302.
    [DOI]
  • 63. Lee Y, Chen ELY, Chan DCH, Dinesh A, Afiuni-Zadeh S, Klamann C, et al. Segmentation aware probabilistic phenotyping of single-cell spatial protein expression data. Nat Commun. 2025;16(1):389.
    [DOI] [PubMed] [PMC]
  • 64. Ma Y, Zhou X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol. 2022;40(9):1349-1359.
    [DOI]
  • 65. Zubair A, Chapple RH, Natarajan S, Wright WC, Pan M, Lee HM, et al. Cell type identification in spatial transcriptomics data can be improved by leveraging cell-type-informative paired tissue images using a Bayesian probabilistic model. Nucleic Acids Res. 2022;50(14):e80.
    [DOI]
  • 66. Yang J, Zheng Z, Jiao Y, Yu K, Bhatara S, Yang X, et al. Spotiphy enables single-cell spatial whole transcriptomics across an entire section. Nat Methods. 2025;22(4):724-736.
    [DOI] [PubMed] [PMC]
  • 67. Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 2022;40(5):661-671.
    [DOI] [PubMed] [PMC]
  • 68. Coleman K, Hu J, Schroeder A, Lee EB, Li M. SpaDecon: Cell-type deconvolution in spatial transcriptomics with semi-supervised learning. Commun Biol. 2023;6:378.
    [DOI]
  • 69. Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. 2022;40(4):517-526.
    [DOI] [PubMed] [PMC]
  • 70. Si Y, Lee C, Hwang Y, Yun JH, Cheng W, Cho CS, et al. FICTURE: Scalable segmentation-free analysis of submicron-resolution spatial transcriptomics. Nat Meth. 2024;21(10):1843-1854.
    [DOI]
  • 71. Park J, Choi W, Tiesmeyer S, Long B, Borm LE, Garren E, et al. Cell segmentation-free inference of cell types from in situ transcriptomics data. Nat Commun. 2021;12(1):3545.
    [DOI] [PubMed] [PMC]
  • 72. Müller-Bötticher N, Tiesmeyer S, Eils R, Ishaque N. Sainsc: A computational tool for segmentation-free analysis of in situ capture data. Small Methods. 2025;9(5):e2401123.
    [DOI] [PubMed] [PMC]
  • 73. Traag VA, Waltman L, van Eck NJ. From Louvain to leiden: Guaranteeing well-connected communities. Sci Rep. 2019;9:5233.
    [DOI]
  • 74. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
    [DOI]
  • 75. Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Meth. 2024;21(4):712-722.
    [DOI]
  • 76. Xiong C, Huang S, Zhou M, Zhang Y, Wu W, Li X, et al. A comprehensive comparison on clustering methods for multi-slice spatially resolved transcriptomics data analysis. Brief Bioinform. 2025;26(5):bbaf471.
    [DOI] [PubMed] [PMC]
  • 77. Toro-Domínguez D, Wang C, Ellson-Lancho I, Martorell-Marugán J, López-Domínguez R, Carmona-Sáez P, et al. Benchmarking single-sample gene set scoring methods for application in precision medicine. Brief Bioinform. 2025;26(6):bbaf684.
    [DOI] [PubMed] [PMC]
  • 78. Singh A, Cakmak P, Lun JH, Macas J, Plate KH, Reiss Y, et al. Benchmarking cell-type deconvolution in cross-platform transcriptomic data. BioRxiv [Preprint]. 2025.
    [DOI]
  • 79. Li H, Zhou J, Li Z, Chen S, Liao X, Zhang B, et al. A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics. Nat Commun. 2023;14(1):1548.
    [DOI] [PubMed] [PMC]
  • 80. Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, et al. Squidpy: A scalable framework for spatial omics analysis. Nat Meth. 2022;19(2):171-178.
    [DOI]
  • 81. Tan Y, Kempchen TN, Becker M, Haist M, Feyaerts D, Liu J, et al. SPACEc: A streamlined, interactive Python workflow for multiplexed image processing and analysis. Nat Commun. 2025;16:10652.
    [DOI]
  • 82. Andrei P, Grieco M, Acha-Sagredo A, Dhami P, Fung K, Rodriguez-Justo M, et al. Kandinsky: Enabling neighbourhood analysis of spatial omics data for functional insights on cell ecosystems. BioRxiv [Preprint]. 2025.
    [DOI]
  • 83. Singhal V, Chou N, Lee J, Yue Y, Liu J, Chock WK, et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet. 2024;56(3):431-441.
    [DOI] [PubMed] [PMC]
  • 84. Cang Z, Zhao Y, Almet AA, Stabell A, Ramos R, Plikus MV, et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat Meth. 2023;20(2):218-228.
    [DOI]
  • 85. Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun. 2020;11(1):2084.
    [DOI] [PubMed] [PMC]
  • 86. Shao X, Li C, Yang H, Lu X, Liao J, Qian J, et al. Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat Commun. 2022;13:4429.
    [DOI]
  • 87. Garcia-Alonso L, Handfield LF, Roberts K, Nikolakopoulou K, Fernando RC, Gardner L, et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat Genet. 2021;53(12):1698-1711.
    [DOI] [PubMed] [PMC]
  • 88. Li H, Ma T, Hao M, Guo W, Gu J, Zhang X, et al. Decoding functional cell–cell communication events by multi-view graph learning on spatial transcriptomics. Brief Bioinform. 2023;24(6):bbad359.
    [DOI]
  • 89. Pham D, Tan X, Balderson B, Xu J, Grice LF, Yoon S, et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun. 2023;14(1):7739.
    [DOI] [PubMed] [PMC]
  • 90. Dimitrov D, Türei D, Garrido-Rodriguez M, Burmedi PL, Nagai JS, Boys C, et al. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat Commun. 2022;13(1):3224.
    [DOI] [PubMed] [PMC]
  • 91. Cesaro G, Nagai JS, Gnoato N, Chiodi A, Tussardi G, Klöker V, et al. Advances and challenges in cell–cell communication inference: A comprehensive review of tools, resources, and future directions. Brief Bioinform. 2025;26(3):bbaf280.
    [DOI]
  • 92. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289-1296.
    [DOI] [PubMed] [PMC]
  • 93. Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Meth. 2018;15(12):1053-1058.
    [DOI]
  • 94. Ludington L, Ouardini K, Secheresse X, Loeb R, Pignet A, Domingues OD, et al. Comprehensive benchmarking of batch integration methods for spatial transcriptomics using a large-scale cancer atlas. BioRxiv [Preprint]. 2026.
    [DOI]
  • 95. Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19(5):567-575.
    [DOI] [PubMed] [PMC]
  • 96. Liu X, Zeira R, Raphael BJ. Partial alignment of multislice spatially resolved transcriptomics data. Genome Res. 2023;33(7):1124-1132.
    [DOI] [PubMed] [PMC]
  • 97. Guo Y, Liu JS, Cheng H, Ma Y. JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics. BioRxiv [Preprint]. 2025.
    [DOI]
  • 98. Clifton K, Anant M, Aihara G, Atta L, Aimiuwu OK, Kebschull JM, et al. STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat Commun. 2023;14(1):8123.
    [DOI] [PubMed] [PMC]
  • 99. Xia CR, Cao ZJ, Tu XM, Gao G. Spatial-linked alignment tool (SLAT) for aligning heterogenous slices. Nat Commun. 2023;14(1):7236.
    [DOI] [PubMed] [PMC]
  • 100. Zhang C, Liu L, Zhang Y, Li M, Fang S, Kang Q, et al. spatiAlign: An unsupervised contrastive learning model for data integration of spatially resolved transcriptomics. GigaScience. 2024;13:giae042.
    [DOI]
  • 101. Wess M, Midtbust E, Guillem JCC, Viset T, Størkersen Ø, Krossa S, et al. Spatial integration of multi-omics data from serial sections using the novel Multi-Omics Imaging Integration Toolset. GigaScience. 2025;14:giaf035.
    [DOI]
  • 102. Zhou X, Dong K, Zhang S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci. 2023;3(10):894-906.
    [DOI]
  • 103. Li H, Lin Y, He W, Han W, Xu X, Xu C, et al. SANTO: A coarse-to-fine alignment and stitching method for spatial omics. Nat Commun. 2024;15(1):6048.
    [DOI] [PubMed] [PMC]
  • 104. Lou Y, Li X, Yang Q, Dai H, Ma K, Zuo C. Vector-guided graph learning for spatial multi-slice multi-omics alignment. Cell Rep Meth. 2025;5(12):101241.
    [DOI]
  • 105. Liu Y, Ma K, Xu H, Xu K, Hu Y, Lin Z, et al. Interpretable spatial multi-omics data integration and dimension reduction with SpaMV. BioRxiv [Preprint]. 2025.
    [DOI]
  • 106. Yang P, Jin K, Yao Y, Jin L, Shao X, Li C, et al. Spatial integration of multi-omics single-cell data with SIMO. Nat Commun. 2025;16:1265.
    [DOI]
  • 107. Liu Y, Wang C, Wang Z, Chen L, Li Z, Song J, et al. High-parameter spatial multi-omics through histology-anchored integration. Nat Meth. 2026;23(2):373-386.
    [DOI]
  • 108. Coleman K, Schroeder A, Loth M, Zhang D, Park JH, Sung JY, et al. Resolving tissue complexity by multimodal spatial omics modeling with MISO. Nat Meth. 2025;22(3):530-538.
    [DOI]
  • 109. Chen S, Zhu B, Huang S, Hickey JW, Lin KZ, Snyder M, et al. Integration of spatial and single-cell data across modalities with weakly linked features. Nat Biotechnol. 2024;42(7):1096-1106.
    [DOI] [PubMed] [PMC]
  • 110. Pitino E, Pascual-Reguant A, Segato-Dezem F, Wise K, Salvador-Martinez I, Crowell HL, et al. STAMP: Single-cell transcriptomics analysis and multimodal profiling through imaging. Cell. 2025;188(18):5100-5117.
    [DOI]
  • 111. Khan M, Arslanturk S, Draghici S. A comprehensive review of spatial transcriptomics data alignment and integration. Nucleic Acids Res. 2025;53(12):gkaf536.
    [DOI]
  • 112. Yan Y, Gu T, Sun C, Zhang Y, Cui Y, Lin S, et al. Benchmarking alignment methods for spatial transcriptomics data. Nat Comput Sci. 2026;1-18.
    [DOI]
  • 113. Atta L, Clifton K, Anant M, Aihara G, Fan J. Gene count normalization in single-cell imaging-based spatially resolved transcriptomics. BioRxiv [Preprint]. 2024.
    [DOI]

© The Author(s) 2026. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Science Exploration remains a neutral stance on jurisdictional claims in published maps and institutional affiliations. The views expressed in this article are solely those of the author(s) and do not reflect the opinions of the Editors or the publisher.

Share And Cite

×

Science Exploration Style
Alexander M, Liu Y, Dezem FS, Chasteen H, Plummer J. Computational workflows and data infrastructures for spatial omics analysis. EXO. 2026;1:202607. https://doi.org/10.70401/EXO.2026.0010

Citation Icon Get citation