Abstract
Aims: Alternative splicing serves as a primary mechanism for diversifying the proteome, making the prediction of distinct isoform functions critical for understanding complex disease mechanisms. However, determining the specific functional roles of isoforms remains hindered by high sequence homology among variants and the sparsity of isoform-level annotations.
Methods: In this study, we propose SpliceEM, a deep learning framework for isoform function prediction at single-cell resolution. SpliceEM utilizes a splicing event-aware encoder with cross-modal attention to separate functional signals from global protein sequences. A Heterogeneous Graph Transformer captures the dependencies among isoforms, genes, and Gene Ontology terms. To bridge the annotation gap, we incorporate a self-distillation framework guided by an Exponential Moving Average teacher model and Multi-Instance Learning, optimized by an Asymmetric Loss and hierarchical constraints.
Results: Benchmarking on human datasets demonstrates that SpliceEM outperforms existing methods in isoform function prediction, particularly in identifying rare functional terms under data-sparse conditions. Furthermore, splicing-function analysis reveals that specific splicing events, such as skipped exons and alternative first exons, act as prominent drivers in oncogenic signaling cascades and context-specific functional switching.
Conclusion: SpliceEM provides a computational foundation for exploring transcriptomic functional diversity. By shifting the focus from global sequences to localized splicing events and utilizing hierarchical biological priors, it offers high-resolution insights into cell-type-specific molecular mechanisms and potential therapeutic targets.
Keywords
References
-
7. Li H, Wang D, Gao Q, Tan P, Wang Y, Cai X, et al. Improving gene isoform quantification with miniQuant. Nat Biotechnol. 2026;44(3):477-489.[DOI]
-
8. Tian L, Jabbari JS, Thijssen R, Gouil Q, Amarasinghe SL, Voogd O, et al. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol. 2021;22(1):310.[DOI]
-
14. Hu X, Wang J, Chen L, Yang Q, Tardaguila M, Mao B, et al. The functional landscape of alternative splicing in hematopoietic lineage commitment. Nat Commun. 2026;17:2047.[DOI]
-
15. Crowl S, Coleman MB, Chaphiv A, Jordan BT, Naegle KM. Systematic analysis of the effects of splicing on the diversity of post-translational modifications in protein isoforms using PTM-POSE. Cell Syst. 2025;16(7):101318.[DOI]
-
21. Luo T, Zhang W, Qiu S, Yang Y. Functional annotation of human protein coding isoforms via non-convex multi-instance learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017 Aug 13-17; Halifax, Canada. New York: Association for Computing Machinery; 2017. p. 345-354.[DOI]
-
28. Zhang S, Yang C, Li HD, Wang J. GraphIsoFun: A graph neural network based approach for splice isoform function prediction. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2021 Dec 9-12; Houston, USA. Piscataway: IEEE; 2021. p. 112-117.[DOI]
-
31. Ridnik T, Ben-Baruch E, Zamir N, Noy A, Friedman I, Protter M, et al. Asymmetric loss for multi-label classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, Canada. Piscataway: IEEE; 2021. p. 82-91.[DOI]
-
33. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1):25-29.[DOI]
-
34. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv [Preprint]. 2022.[DOI]
-
36. Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A literature review of gene function prediction by modeling gene ontology. Front Genet. 2020;11:400.[DOI]
-
37. Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the structural impact of human alternative splicing. Genome Biol. 2025;26(1):283.[DOI]
Copyright
© The Author(s) 2026. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher’s Note
Share And Cite


