Robust multimodal emotion recognition under missing and incomplete data with cross-modal regeneration

Robust multimodal emotion recognition under missing and incomplete data with cross-modal regeneration

Behzad Mahaseni
*
,
Naimul Mefraz Khan
*Correspondence to: Behzad Mahaseni, Department of Electrical and Computer Engineering, Toronto Metropolitan University, Toronto M5B 2R2, Ontario, Canada. E-mail: behzad.mahaseni@torontomu.ca
Empath Comput. 2026;2:202610. 10.70401/ec.2026.0023
Received: March 26, 2026Accepted: July 01, 2026Published: July 02, 2026
This article belongs to the Special lssue  Adaptive Empathic Interactive Media for Therapy
Tips Icon
This manuscript is made available in its unedited form to allow early access to the reported findings. Further editing will be completed before final publication. As such, the content may include errors, and standard legal disclaimers are applicable.

Abstract

Aims: Multimodal emotion recognition (MER) can outperform unimodal approaches by integrating complementary information from multiple sources. However, real-world applications often involve incomplete or missing modalities, limiting the reliability of existing MER models. This study aims to develop a framework that remains robust under missing modality conditions while preserving the benefits of multimodal integration.

Methods: To address this challenge, we propose a cross-modal latent regeneration and attention long short-term memory (CMLR-ALSTM) framework. The framework integrates pretrained variational autoencoder encoders with residual projection networks trained with L2 loss to achieve stable, effective latent-space alignment across modalities. The regenerated latent features, along with the original available ones, are then integrated through a cross-modal attention mechanism and passed to an long short-term memory (LSTM) network to capture temporal dependencies and enhance multimodal fusion under incomplete data conditions.

Results: We conducted two sets of experiments. First, we evaluated the proposed CMLR-ALSTM framework on three benchmark datasets under the complete-modality scenario, where all modalities were available, and compared its performance against state-of-the-art methods and baseline models. Second, to assess model robustness, we evaluated CMLR-ALSTM under missing and partially missing modality scenarios. Experimental results on the DEAP, MAHNOB-HCI, and SEED-IV datasets demonstrate that CMLR-ALSTM achieves up to 17.22% improvement under missing-modality conditions while maintaining competitive performance with state-of-the-art methods, highlighting its ability to preserve cross-modal relationships and maintain robust latent representations.

Conclusion: The experimental results confirm the effectiveness of the proposed CMLR-ALSTM framework for MER, particularly in realistic environments where data availability is inconsistent. By leveraging CMLR to reconstruct missing representations and modelling temporal dependencies through LSTM, the proposed approach provides a more robust and reliable MER framework for practical deployment. In addition, results across different combinations of modalities indicate that the proposed framework generalized well across heterogeneous multimodal settings.

Keywords

Multimodal emotion recognition, cross-modal regeneration, variational autoencoder, long short-term memory, electroencephalogram, facial features

References

  • 1. Bower GH. Mood and memory. Am Psychol. 1981;36(2):129-148.
    [DOI]
  • 2. Izard CE. Basic emotions, relations among emotions, and emotion-cognition relations. Psychol Rev. 1992;99(3):561-565.
    [DOI] [PubMed]
  • 3. Tyng CM, Amin HU, Saad MNM, Malik AS. The influences of emotion on learning and memory. Front Psychol. 2017;8:1454.
    [DOI]
  • 4. LeDoux JE. Emotion circuits in the brain. Annu Rev Neurosci. 2000;23(1):155-184.
    [DOI]
  • 5. Ezzameli K, Mahersia H. Emotion recognition from unimodal to multimodal analysis: A review. Inf Fusion. 2023;99:101847.
    [DOI]
  • 6. Lian H, Lu C, Li S, Zhao Y, Tang C, Zong Y. A survey of deep learning-based multimodal emotion recognition: Speech, text, and face. Entropy. 2023;25(10):1440.
    [DOI] [PubMed] [PMC]
  • 7. Cheng C, Liu W, Fan Z, Feng L, Jia Z. A novel transformer autoencoder for multi-modal emotion recognition with incomplete data. Neural Netw. 2024;172:106111.
    [DOI] [PubMed]
  • 8. Wu R, Wang H, Chen HT. Deep multimodal learning with missing modality: A survey. arXiv:2409.07825v4 [Preprint]. 2024.
    [DOI]
  • 9. Kumar S, Qiu P, Yang B. Multimodal normative modeling in Alzheimer’s disease with introspective variational autoencoders. BioRxiv [Preprint] 2024.
    [DOI]
  • 10. Chen R, Zhou W, Hu H, Fei Z, Fei M, Zhou H. Disentangled variational auto-encoder for multimodal fusion performance analysis in multimodal sentiment analysis. Knowl Based Syst. 2024;301:112372.
    [DOI]
  • 11. Baltrusaitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2019;41(2):423-443.
    [DOI]
  • 12. Yang K, Wang C, Gu Y, Sarsenbayeva Z, Tag B, Dingler T, et al. Behavioral and physiological signals-based deep multimodal approach for mobile emotion recognition. IEEE Trans Affect Comput. 2023;14(2):1082-1097.
    [DOI]
  • 13. Li Y, Wang M, Gong M, Lu Y, Liu L. FER-former: Multimodal transformer for facial expression recognition. IEEE Trans Multimed. 2025;27:2412-2422.
    [DOI]
  • 14. Singh R, Saurav S, Kumar T, Saini R, Vohra A, Singh S. Facial expression recognition in videos using hybrid CNN & ConvLSTM. Int J Inf Technol. 2023;15(4):1819-1830.
    [DOI]
  • 15. Sarvakar K, Senkamalavalli R, Raghavendra S, Kumar JS, Manjunath R, Jaiswal S. Facial emotion recognition using convolutional neural networks. Mater Today Proc. 2023;80:3560-3564.
    [DOI]
  • 16. Karatay B, Beştepe D, Sailunaz K, Özyer T, Alhajj R. CNN-Transformer based emotion classification from facial expressions and body gestures. Multimed Tools Appl. 2024;83(8):23129-23171.
    [DOI]
  • 17. Hangaragi S, Singh T, Neelima N. Face detection and recognition using face mesh and deep neural network. Procedia Comput Sci. 2023;218:741-749.
    [DOI]
  • 18. Akinpelu S, Viriri S, Adegun A. An enhanced speech emotion recognition using vision transformer. Sci Rep. 2024;14(1):13126.
    [DOI] [PubMed] [PMC]
  • 19. Ye J, Wen XC, Wei Y, Xu Y, Liu K, Shan H, et al. Temporal modeling matters: A novel temporal emotional modeling approach for speech emotion recognition. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2023 Jun 4-10; Rhodes Island, Greece. Piscataway: IEEE; 2023. p. 1-5.
    [DOI]
  • 20. Kumar S, Haq M, Jain A, Jason C, Moparthi N, Mittal N, et al. Multilayer neural network based speech emotion recognition forsmartassistance. Comput Mater Continua. 2023;74(1):1523-1540.
    [DOI]
  • 21. Vaijayanthi S, Arunnehru J. Human emotion recognition from body posture with machine learning techniques. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T, editors. Advances in computing and data sciences. Proceedings of the 6th International Conference; 2022 Apr 22-23; Kurnool, India. Cham: Springer; 2022. p. 231-242.
    [DOI]
  • 22. Wang L, Hao J, Zhou TH. ECG multi-emotion recognition based on heart rate variability signal features mining. Sensors. 2023;23(20):8636.
    [DOI] [PubMed] [PMC]
  • 23. Fang A, Pan F, Yu W, Yang L, He P. ECG-based emotion recognition using random convolutional kernel method. Biomed Signal Process Control. 2024;91:105907.
    [DOI]
  • 24. Liu S, Wang Z, An Y, Zhao J, Zhao Y, Zhang YD. EEG emotion recognition based on the attention mechanism and pre-trained convolution capsule network. Knowl Based Syst. 2023;265:110372.
    [DOI]
  • 25. Wei Y, Liu Y, Li C, Cheng J, Song R, Chen X. TC-Net: A transformer capsule network for EEG-based emotion recognition. Comput Biol Med. 2023;152:106463.
    [DOI]
  • 26. Iyer A, Das SS, Teotia R, Maheshwari S, Sharma RR. CNN and LSTM based ensemble learning for human emotion recognition using EEG recordings. Multimed Tools Appl. 2023;82(4):4883-4896.
    [DOI]
  • 27. Gunes H, Pantic M. Automatic, dimensional and continuous emotion recognition. Int J Synth Emot. 2010;1(1):68-99.
    [DOI]
  • 28. Zhang J, Yin Z, Chen P, Nichele S. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Inf Fusion. 2020;59:103-126.
    [DOI]
  • 29. Kapase AB, Uke N. A comprehensive review in affective computing: An exploration of artificial intelligence in unimodal and multimodal emotion recognition systems. Int J Speech Technol. 2025;28(2):541-563.
    [DOI]
  • 30. Moon AS, Kim H, Park YC, Lee J. A survey on multimodal emotion recognition: Methods, datasets, and future directions. Comput Mater Contin. 2026;87(2):1.
    [DOI]
  • 31. Tseng HT, Hsieh CC, Xu CH. A two-stage multimodal emotion analysis using body actions and facial features. Signal Image Video Process. 2025;19(4):313.
    [DOI]
  • 32. Liu J, Wang Z, Nie W, Zeng J, Zhou B, Deng J, et al. Multimodal emotion recognition for children with autism spectrum disorder in social interaction. Int J Hum. 2024;40(8):1921-1930.
    [DOI]
  • 33. Dessai A, Virani H. Multimodal and multidomain feature fusion for emotion classification based on electrocardiogram and galvanic skin response signals. Sci. 2024;6(1):10.
    [DOI]
  • 34. Lopez E, Chiarantano E, Grassucci E, Comminiello D. Hyper complex multimodal emotion recognition from EEG and peripheral physiological signals. In: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW); 2023 Jun 4-10; Rhodes Island, Greece. Piscataway: IEEE; 2023. p. 1-5.
    [DOI]
  • 35. Pan J, Fang W, Zhang Z, Chen B, Zhang Z, Wang S. Multimodal emotion recognition based on facial expressions, speech, and EEG. IEEE Open J Eng Med Biol. 2024;5:396-403.
    [DOI]
  • 36. Yin G, Liu Y, Liu T, Zhang H, Fang F, Tang C, et al. Token-disentangling Mutual Transformer for multimodal emotion recognition. Eng Appl Artif Intell. 2024;133:108348.
    [DOI]
  • 37. Ali K, Hughes CE. A unified transformer-based network for multimodal emotion recognition. arXiv:2308.14160v1 [Preprint]. 2023.
    [DOI]
  • 38. Fu B, Gu C, Fu M, Xiao Y, Liu Y. A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals. Front Neurosci. 2023;17:1234162.
    [DOI]
  • 39. Wu Y, Daoudi M, Amad A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans Affect Comput. 2024;15(1):157-172.
    [DOI]
  • 40. Wang R, Zhu J, Wang S, Wang T, Huang J, Zhu X. Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking. Int J Multimed Inf Retr. 2024;13(4):39.
    [DOI]
  • 41. Singh P, Tripathi MK, Patil MB, Shivendra , Neelakantappa M. Multimodal emotion recognition model via hybrid model with improved feature level fusion on facial and EEG feature set. Multimed Tools Appl. 2025;84(1):1-36.
    [DOI]
  • 42. Mutawa AM, Hassouneh A. Multimodal real-time patient emotion recognition system using facial expressions and brain EEG signals based on machine learning and log-sync methods. Biomed Signal Process Control. 2024;91:105942.
    [DOI]
  • 43. Chen Y, Bai Z, Cheng M, Liu Y, Zhao X, Song Y. Multimodal emotion recognition for hearing-impaired subjects by fusing EEG signals and facial expressions. In: 2023 42nd Chinese Control Conference (CCC); 2023 Jul 24-26; Tianjin, China. Piscataway: IEEE; 2023. p. 1-6.
    [DOI]
  • 44. Wang S, Qu J, Zhang Y, Zhang Y. Multimodal emotion recognition from EEG signals and facial expressions. IEEE Access. 2023;11:33061-33068.
    [DOI]
  • 45. Wu Y, Li J. Multi-modal emotion identification fusing facial expression and EEG. Multimed Tools Appl. 2023;82(7):10901-10919.
    [DOI]
  • 46. Zhu Q, Lu G, Yan J. Valence-arousal model based emotion recognition using EEG, peripheral physiological signals and facial expression. In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing; 2026 Jan 24-25; Copenhagen, Denmark. New York: Association for Computing Machinery; 2020. p. 81-85.
    [DOI]
  • 47. Song Y, Feng L, Zhang W, Song X, Cheng M. Multimodal emotion recognition based on the fusion of EEG signals and eye movement data. In: 2024 IEEE 25th China Conference on System Simulation Technology and its Application (CCSSTA); 2024 Jul 21-23; Tianjin, China. Piscataway: IEEE; 2024. p. 127-132.
    [DOI]
  • 48. Fu B, Chu W, Gu C, Liu Y. Cross-modal guiding neural network for multimodal emotion recognition from EEG and eye movement signals. IEEE J Biomed Health Inform. 2024;28(10):5865-5876.
    [DOI] [PubMed]
  • 49. Koromilas P, Giannakopoulos T. Deep multimodal emotion recognition on human speech: A review. Appl Sci. 2021;11(17):7962.
    [DOI]
  • 50. Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Inf Fusion. 2017;37:98-125.
    [DOI]
  • 51. Ramachandram D, Taylor GW. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process Mag. 2017;34(6):96-108.
    [DOI]
  • 52. Lian Z, Chen L, Sun L, Liu B, Tao J. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Trans Pattern Anal Mach Intell. 2023;45(7):8419-8432.
    [DOI]
  • 53. Fu F, Ai W, Yang F, Shou Y, Meng T, Li K. SDR-GNN: Spectral domain reconstruction graph neural network for incomplete multimodal learning in conversational emotion recognition. Knowl Based Syst. 2025;309:112825.
    [DOI]
  • 54. Liu R, Zuo H, Lian Z, Schuller BW, Li H. Contrastive learning based modality-invariant feature acquisition for robust multimodal emotion recognition with missing modalities. IEEE Trans Affect Comput. 2024;15(4):1856-1873.
    [DOI]
  • 55. Xu W, Jiang H, Liang X. Leveraging knowledge of modality experts for incomplete multimodal learning. In: Proceedings of the 32nd ACM International Conference on Multimedia; 2024 Oct 28-Nov 1; Melbourne, Australia. New York: ACM; 2024. p.438-446.
    [DOI]
  • 56. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv:1312.6114 [Preprint]. 2022.
    [DOI]
  • 57. Aguila AL, Chapman J, Altmann A. Multi-modal variational autoencoders for Normative modelling across multiple imaging modalities. In: International Conference on Medical Image Computing and Computer-Assisted Intervention; 2023 Oct 8-12; Vancouver, Canada. Cham: Springer; 2023. p. 425-434.
    [DOI]
  • 58. Martí-Juan G, Lorenzi M, Piella G. MC-RVAE: Multi-channel recurrent variational autoencoder for multimodal Alzheimer’s disease progression modelling. Neuroimage. 2023;268:119892.
    [DOI] [PubMed]
  • 59. Shi T, Wei Y, and Kender JR. An efficient and explanatory image and text clustering system with multimodal autoencoder architecture. arXiv:2408.07791v1 [Preprint]. 2024.
    [DOI]
  • 60. Suzuki M, Matsuo Y. A survey of multimodal deep generative models. Adv Robot. 2022;36(5-6):261-278.
    [DOI]
  • 61. Kutuzova S, Krause O, McCloskey D. Multimodal variational autoencoders for semi-supervised learning: In defense of product-of-experts. arXiv:2101.07240v2 [Preprint]. 2021.
    [DOI]
  • 62. Kingma DP, Welling M. An introduction to variational autoencoders. Found Trends Mach Learn. 2019;12(4):307-392.
    [DOI]
  • 63. Mahaseni B, Khan NM. Multimodal emotion recognition with disentangled representations: Private-shared multimodal variational autoencoder and long short-term memory framework. Empath Comput. 2025;1(2):202507.
    [DOI]
  • 64. Assran M, Duval Q, Misra I, Bojanowski P, Vincent P, Rabbat M, et al. Self-supervised learning from images with a joint-embedding predictive architecture. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, Canada. Piscataway: IEEE; 2023. p. 15619-15629.
    [DOI]
  • 65. Wei X, Zhang T, Li Y, Zhang Y, Wu F. Multi-modality cross attention network for image and sentence matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, USA. Piscataway: IEEE; 2020. p. 10938-10947.
    [DOI]
  • 66. Burgess CP, Higgins I, Pal A, Matthey L, Watters N, Desjardins G, et al. Understanding disentangling in β-vae. arXiv:1804.03599v1 [Preprint]. 2018.
    [DOI]
  • 67. Ma DS, Correll J, Wittenbrink B. The Chicago face database: A free stimulus set of faces and norming data. Behav Res Methods. 2015;47(4):1122-1135.
    [DOI] [PubMed]
  • 68. Phillips PJ, Wechsler H, Huang J, Rauss PJ. The FERET database and evaluation procedure for face-recognition algorithms. Image Vis Comput. 1998;16(5):295-306.
    [DOI]
  • 69. Wallhoff F, Schuller B, Hawellek M, Rigoll G. Efficient recognition of authentic dynamic facial expressions on the feedtum database. In: 2006 IEEE International Conference on Multimedia and Expo; 2026 Jul 9-12; Toronto, Canada. Piscataway: IEEE; 2006. p. 493-496.
    [DOI]
  • 70. Milborrow S, Morkel J, Nicolls F. The MUCT face database [Internet]. Pretoria: Pattern Recognition Association of South Africa; 2010. Avaliable from: http://www.milbo.org/muct/
  • 71. AlmightyJ. Person face dataset (thispersondoesnotexist) [Internet]. San Francisco: Kaggle; 2024. Available from: https://www.kaggle.com/datasets/almightyj/person-face-dataset-thispersondoesnotexist
  • 72. Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, et al. DEAP: A database for emotion analysis; using physiological signals. IEEE Trans Affect Comput. 2012;3(1):18-31.
    [DOI]
  • 73. Soleymani M, Lichtenauer J, Pun T, Pantic M. A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput. 2012;3(1):42-55.
    [DOI]
  • 74. Zheng WL, Liu W, Lu Y, Lu BL, Cichocki A. Emotionmeter: A multimodal framework for recognizing human emotions. IEEE Trans Cybern. 2019;49(3):1110-1122.
    [DOI] [PubMed]
  • 75. Russell JA. A circumplex model of affect. J Pers Soc Psychol. 1980;39(6):1161-1178.
    [DOI]

© The Author(s) 2026. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Science Exploration remains a neutral stance on jurisdictional claims in published maps and institutional affiliations. The views expressed in this article are solely those of the author(s) and do not reflect the opinions of the Editors or the publisher.

Share And Cite

×

Science Exploration Style
Mahaseni B, Khan NM. Robust multimodal emotion recognition under missing and incomplete data with cross-modal regeneration. Empath Comput. 2026;2:202610. https://doi.org/10.70401/ec.2026.0023

Submit a Manuscript
Author Instructions
Cite this Article
Export Citation
Article Metrics
0
View
0
Download
Cited
Article Updates
Citation Icon Get citation