Isoform function prediction via knowledge distillation from alternative splicing

Isoform function prediction via knowledge distillation from alternative splicing

Tong Gu
1,3
,
Jun Wang
2,3,*
*Correspondence to: Jun Wang, Shandong Research Institute of Industrial Technology, Jinan 250101, Shandong, China. E-mail: kingjun@sdu.edu.cn
Comput Biomed. 2026;1:202614. 10.70401/cbm.2026.0019
Received: April 14, 2026Accepted: June 12, 2026Published: June 15, 2026
Tips Icon
This manuscript is made available in its unedited form to allow early access to the reported findings. Further editing will be completed before final publication. As such, the content may include errors, and standard legal disclaimers are applicable.

Abstract

Aims: Alternative splicing serves as a primary mechanism for diversifying the proteome, making the prediction of distinct isoform functions critical for understanding complex disease mechanisms. However, determining the specific functional roles of isoforms remains hindered by high sequence homology among variants and the sparsity of isoform-level annotations.

Methods: In this study, we propose SpliceEM, a deep learning framework for isoform function prediction at single-cell resolution. SpliceEM utilizes a splicing event-aware encoder with cross-modal attention to separate functional signals from global protein sequences. A Heterogeneous Graph Transformer captures the dependencies among isoforms, genes, and Gene Ontology terms. To bridge the annotation gap, we incorporate a self-distillation framework guided by an Exponential Moving Average teacher model and Multi-Instance Learning, optimized by an Asymmetric Loss and hierarchical constraints.

Results: Benchmarking on human datasets demonstrates that SpliceEM outperforms existing methods in isoform function prediction, particularly in identifying rare functional terms under data-sparse conditions. Furthermore, splicing-function analysis reveals that specific splicing events, such as skipped exons and alternative first exons, act as prominent drivers in oncogenic signaling cascades and context-specific functional switching.

Conclusion: SpliceEM provides a computational foundation for exploring transcriptomic functional diversity. By shifting the focus from global sequences to localized splicing events and utilizing hierarchical biological priors, it offers high-resolution insights into cell-type-specific molecular mechanisms and potential therapeutic targets.

Keywords

Isoform function prediction, alternative splicing events, single-cell resolution, self-knowledge distillation

References

  • 1. Aguzzoli Heberle B, Brandon JA, Page ML, Nations KA, Dikobe KI, White BJ, et al. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq. Nat Biotechnol. 2025;43(4):635-646.
    [DOI] [PubMed] [PMC]
  • 2. Wang X, Liu T, Li Y, Ding A, Zhang C, Gu Y, et al. A splicing isoform of PD-1 promotes tumor progression as a potential immune checkpoint. Nat Commun. 2024;15(1):9114.
    [DOI] [PubMed] [PMC]
  • 3. Kashkan I, Hrtyan M, Retzer K, Humpolíčková J, Jayasree A, Filepová R, et al. Mutually opposing activity of PIN7 splicing isoforms is required for auxin-mediated tropic responses in Arabidopsis thaliana. New Phytol. 2022;233(1):329-343.
    [DOI] [PubMed]
  • 4. López I, Valdivia IL, Vojtesek B, Fåhraeus R, Coates PJ. Re-appraising the evidence for the source, regulation and function of p53-family isoforms. Nucleic Acids Res. 2024;52(20):12112-12129.
    [DOI] [PubMed] [PMC]
  • 5. Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019;20(1):244.
    [DOI] [PubMed] [PMC]
  • 6. Su Y, Yu Z, Jin S, Ai Z, Yuan R, Chen X, et al. Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data. Nat Commun. 2024;15(1):3972.
    [DOI] [PubMed] [PMC]
  • 7. Li H, Wang D, Gao Q, Tan P, Wang Y, Cai X, et al. Improving gene isoform quantification with miniQuant. Nat Biotechnol. 2026;44(3):477-489.
    [DOI]
  • 8. Tian L, Jabbari JS, Thijssen R, Gouil Q, Amarasinghe SL, Voogd O, et al. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol. 2021;22(1):310.
    [DOI]
  • 9. Philpott M, Watson J, Thakurta A, Brown TJ, Brown TS, Oppermann U, et al. Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq. Nat Biotechnol. 2021;39(12):1517-1520.
    [DOI] [PubMed] [PMC]
  • 10. Gleeson J, Madugalle SU, Wan CY, McLean C, Bredy TW, De Paoli-Iseppi R, et al. Isoform-level profiling of m6A epitranscriptomic signatures in human brain. Sci Adv. 2025;11(32):eadp0783.
    [DOI] [PubMed] [PMC]
  • 11. Veiga DFT, Nesta A, Zhao Y, Mays AD, Huynh R, Rossi R, et al. A comprehensive long-read isoform analysis platform and sequencing resource for breast cancer. Sci Adv. 2022;8(3):eabg6711.
    [DOI] [PubMed] [PMC]
  • 12. Al’Khafaji AM, Smith JT, Garimella KV, Babadi M, Popic V, Sade-Feldman M, et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol. 2024;42(4):582-586.
    [DOI] [PubMed] [PMC]
  • 13. Bhatia S, Field MA, Hebbard L, Schmitz U. Bioinformatics frameworks for single-cell long-read sequencing: Unlocking isoform-level resolution. Brief Bioinform. 2025;26(6):bbaf655.
    [DOI] [PubMed] [PMC]
  • 14. Hu X, Wang J, Chen L, Yang Q, Tardaguila M, Mao B, et al. The functional landscape of alternative splicing in hematopoietic lineage commitment. Nat Commun. 2026;17:2047.
    [DOI]
  • 15. Crowl S, Coleman MB, Chaphiv A, Jordan BT, Naegle KM. Systematic analysis of the effects of splicing on the diversity of post-translational modifications in protein isoforms using PTM-POSE. Cell Syst. 2025;16(7):101318.
    [DOI]
  • 16. Xiang X, He Y, Zhang Z, Yang X. Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance. Nat Commun. 2024;15(1):2164.
    [DOI] [PubMed] [PMC]
  • 17. Wen X, Lv X, Guo D, Han N, Zhou L, Wang P, et al. Deciphering splicing heterogeneity at single-cell resolution by SCSES. Nat Commun. 2025;16(1):9459.
    [DOI] [PubMed] [PMC]
  • 18. Song K, Zheng Y, Zhao B, Eidelman DH, Tang J, Ding J. DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads. Nat Commun. 2025;16(1):6202.
    [DOI] [PubMed] [PMC]
  • 19. Eksi R, Li HD, Menon R, Wen Y, Omenn GS, Kretzler M, et al. Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol. 2013;9(11):e1003314.
    [DOI] [PubMed] [PMC]
  • 20. Li W, Kang S, Liu CC, Zhang S, Shi Y, Liu Y, et al. High-resolution functional annotation of human transcriptome: Predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res. 2014;42(6):e39.
    [DOI] [PubMed] [PMC]
  • 21. Luo T, Zhang W, Qiu S, Yang Y. Functional annotation of human protein coding isoforms via non-convex multi-instance learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017 Aug 13-17; Halifax, Canada. New York: Association for Computing Machinery; 2017. p. 345-354.
    [DOI]
  • 22. Yu G, Wang K, Domeniconi C, Guo M, Wang J. Isoform function prediction based on bi-random walks on a heterogeneous network. Bioinformatics. 2020;36(1):303-310.
    [DOI] [PubMed]
  • 23. Wang K, Wang J, Domeniconi C, Zhang X, Yu G. Differentiating isoform functions with collaborative matrix factorization. Bioinformatics. 2020;36(6):1864-1871.
    [DOI] [PubMed]
  • 24. Huang Q, Wang J, Zhang X, Guo M, Yu G. IsoDA: Isoform-disease association prediction by multiomics data fusion. J Comput Biol. 2021;28(8):804-819.
    [DOI] [PubMed]
  • 25. Chen H, Shaw D, Zeng J, Bu D, Jiang T. DIFFUSE: Predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics. 2019;35(14):i284-i294.
    [DOI] [PubMed] [PMC]
  • 26. Yu G, Zhou G, Zhang X, Domeniconi C, Guo M. DMIL-IsoFun: Predicting isoform function using deep multi-instance learning. Bioinformatics. 2021;37(24):4818-4825.
    [DOI] [PubMed]
  • 27. Chen H, Shaw D, Bu D, Jiang T. FINER: Enhancing the prediction of tissue-specific functions of isoforms by refining isoform interaction networks. NAR Genom Bioinform. 2021;3(2):lqab057.
    [DOI] [PubMed] [PMC]
  • 28. Zhang S, Yang C, Li HD, Wang J. GraphIsoFun: A graph neural network based approach for splice isoform function prediction. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2021 Dec 9-12; Houston, USA. Piscataway: IEEE; 2021. p. 112-117.
    [DOI]
  • 29. Qiu S, Yu G, Lu X, Domeniconi C, Guo M. Isoform function prediction by Gene Ontology embedding. Bioinformatics. 2022;38(19):4581-4588.
    [DOI] [PubMed]
  • 30. Liu Y, Li HD, Wang J. CrossIsoFun: Predicting isoform functions using the integration of multi-omics data. Bioinformatics. 2024;41(1):btae742.
    [DOI] [PubMed] [PMC]
  • 31. Ridnik T, Ben-Baruch E, Zamir N, Noy A, Friedman I, Protter M, et al. Asymmetric loss for multi-label classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, Canada. Piscataway: IEEE; 2021. p. 82-91.
    [DOI]
  • 32. Trincado JL, Entizne JC, Hysenaj G, Singh B, Skalic M, Elliott DJ, et al. SUPPA2: Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19(1):40.
    [DOI] [PubMed] [PMC]
  • 33. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1):25-29.
    [DOI]
  • 34. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv [Preprint]. 2022.
    [DOI]
  • 35. Shiau CK, Lu L, Kieser R, Fukumura K, Pan T, Lin HY, et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun. 2023;14(1):4124.
    [DOI] [PubMed] [PMC]
  • 36. Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A literature review of gene function prediction by modeling gene ontology. Front Genet. 2020;11:400.
    [DOI]
  • 37. Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the structural impact of human alternative splicing. Genome Biol. 2025;26(1):283.
    [DOI]
  • 38. Anczukow O, Allain FH, Angarola BL, Black DL, Brooks AN, Cheng C, et al. Steering research on mRNA splicing in cancer towards clinical translation. Nat Rev Cancer. 2024;24(12):887-905.
    [DOI] [PubMed] [PMC]

© The Author(s) 2026. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Science Exploration remains a neutral stance on jurisdictional claims in published maps and institutional affiliations. The views expressed in this article are solely those of the author(s) and do not reflect the opinions of the Editors or the publisher.

Share And Cite

×

Science Exploration Style
Gu T, Wang J. Isoform function prediction via knowledge distillation from alternative splicing. Comput Biomed. 2026;1:202614. https://doi.org/10.70401/cbm.2026.0019

Submit a Manuscript
Author Instructions
Cite this Article
Export Citation
Article Metrics
0
View
0
Download
Cited
Article Updates
Citation Icon Get citation