Table Of Contents (4 Articles)
Multimodal emotion recognition with disentangled representations: private-shared multimodal variational autoencoder and long short-term memory framework
Aims: This study proposes a multimodal emotion recognition framework that combines a private-shared disentangled multimodal variational autoencoder (DMMVAE) with a long short-term memory (LSTM) network, herein referred to as DMMVAE-LSTM. The ...
More.Aims: This study proposes a multimodal emotion recognition framework that combines a private-shared disentangled multimodal variational autoencoder (DMMVAE) with a long short-term memory (LSTM) network, herein referred to as DMMVAE-LSTM. The primary objective is to improve the robustness and generalizability of emotion recognition by effectively leveraging the complementary features of electroencephalogram (EEG) signals and facial expression data.
Methods: We first trained a variational autoencoder using a ResNet-101 architecture on a large-scale facial dataset to develop a robust and generalizable facial feature extractor. This pre-trained model was then integrated into the DMMVAE framework, together with a convolutional neural network-based encoder and decoder for EEG data. The DMMVAE model was trained to disentangle shared and modality-specific latent representations across both EEG and facial data. Following this, the outputs of the encoders were concatenated and fed into a LSTM classifier for emotion recognition.
Results : Two sets of experiments were conducted. First, we trained and evaluated our model on the full dataset, comparing its performance with state-of-the-art methods and a baseline LSTM model employing a late fusion strategy to combine EEG and facial features. Second, to assess robustness, we tested the DMMVAE-LSTM framework under data-limited and modality dropout conditions by training with partial data and simulating missing modalities. The results demonstrate that the DMMVAE-LSTM framework consistently outperforms the baseline, especially in scenarios with limited data, indicating its capacity to learn structured and resilient latent representations.
Conclusion : Our findings underscore the benefits of multimodal generative modeling for emotion recognition, particularly in enhancing classification performance when training data are scarce or partially missing. By effectively learning both shared and private representations, DMMVAE-LSTM framework facilitates more reliable emotion classification and presents a promising solution for real-world applications where acquiring large labeled datasets is challenging.
Less.Behzad Mahaseni, Naimul Mefraz Khan
DOI:https://doi.org/10.70401/ec.2025.0010 - June 29, 2025
Empathic extended reality in the era of generative AI
Aims: Extended reality (XR) has been widely recognized for its ability to evoke empathetic responses by immersing users in virtual scenarios and promoting perspective-taking. However, to fully realize the empathic potential of XR, it is necessary ...
More.Aims: Extended reality (XR) has been widely recognized for its ability to evoke empathetic responses by immersing users in virtual scenarios and promoting perspective-taking. However, to fully realize the empathic potential of XR, it is necessary to move beyond the concept of XR as a unidirectional “empathy machine.” This study proposes a bidirectional “empathy-enabled XR” framework, wherein XR systems not only elicit empathy but also demonstrate empathetic behaviors by sensing, interpreting, and adapting to users’ affective and cognitive states.
Methods: Two complementary frameworks are introduced. The first, the Empathic Large Language Model (EmLLM) framework, integrates multimodal user sensing (e.g., voice, facial expressions, physiological signals, and behavior) with large language models (LLMs) to enable bidirectional empathic communication. The second, the Matrix framework, leverages multimodal user and environmental inputs alongside multimodal LLMs to generate context-aware 3D objects within XR environments. This study presents the design and evaluation of two prototypes based on these frameworks: a physiology-driven EmLLM chatbot for stress management, and a Matrix-based mixed reality (MR) application that dynamically generates everyday 3D objects.
Results: The EmLLM-based chatbot achieved 85% accuracy in stress detection, with participants reporting strong therapeutic alliance scores. In the Matrix framework, the use of a pre-generated 3D model repository significantly reduced graphics processing unit utilization and improved system responsiveness, enabling real-time scene augmentation on resource-constrained XR devices.
Conclusion: By integrating EmLLM and Matrix, this research establishes a foundation for empathy-enabled XR systems that dynamically adapt to users’ needs, affective and cognitive states, and situational contexts through real-time 3D content generation. The findings demonstrate the potential of such systems in diverse applications, including mental health support and collaborative training, thereby opening new avenues for immersive, human-centered XR experiences.
Less.Poorvesh Dongre, ... Denis Gračanin
DOI:https://doi.org/10.70401/ec.2025.0009 - June 29, 2025
Integrating colored lights into multimodal robotic storytelling
Aims: Storytelling has evolved alongside human culture, giving rise to new media such as social robots. While these robots employ modalities similar to those used by humans, they can also utilize non-biomimetic modalities, such as color, which are ...
More.Aims: Storytelling has evolved alongside human culture, giving rise to new media such as social robots. While these robots employ modalities similar to those used by humans, they can also utilize non-biomimetic modalities, such as color, which are commonly associated with emotions. As research on the use of colored light in robotic storytelling remains limited, this study investigates its integration through three empirical studies.
Methods: We conducted three studies to explore the impact of colored light in robotic storytelling. The first study examined the effect of emotion-inducing colored lighting in romantic storytelling. The second study employed an online survey to determine appropriate light colors for specific emotions, based on images of the robot’s emotional expressions. The third study compared four lighting conditions in storytelling: emotion-driven colored lights, context-based colored lights, constant white light, and no additional lighting.
Results: The first study found that while colored lighting did not significantly influence storytelling experience or perception of the robot, it made recipients felt more serene. The second study showed improved recognition of amazement, rage, and neutral emotional states when colored light accompanied body language. The third study revealed no significant differences across lighting conditions in terms of storytelling experience, emotions, or robot perception; however, participants generally appreciated the use of colored lights. Emotion-driven lighting received slightly more favorable subjective evaluations.
Conclusion: Colored lighting can enhance the emotional expressiveness of robots. Both emotion- driven and context-based lighting strategies are appropriate for robotic storytelling. Through this series of studies, we contribute to the understanding of how colored lights are perceived in robotic communication, particularly within storytelling contexts.
Less.Sophia C. Steinhaeusser, ... Birgit Lugrin
DOI:https://doi.org/10.70401/ec.2025.0008 - May 10, 2025
Investigating the 'I' in team: development and evaluation of an individual-level IMO model for augmented reality-mediated remote collaboration
Aims: This study aims to enhance the design of augmented reality (AR) technologies for remote collaboration by examining the complex relationships among individual factors (user characteristics), psychological and physiological states during ...
More.Aims: This study aims to enhance the design of augmented reality (AR) technologies for remote collaboration by examining the complex relationships among individual factors (user characteristics), psychological and physiological states during AR-mediated remote collaboration, and outcomes within an Input-Mediator-Output (IMO) model. The goal is to evaluate how individual characteristics influence psychological and physiological experiences, as well as task performance, in AR-mediated collaboration.
Methods: We hypothesize and evaluate an IMO model and use correlation analyses to examine the relationships among person-related input variables (e.g., predispositions, traits, attitudes, states, and contextual factors), psychological and physiological emergent states, and performance-related output variables.
Results: Our results demonstrate that individual characteristics significantly influence subjective experiences, physiological responses, and task performance, emphasizing the critical role of individual differences, alongside task- and technology-related factors, in shaping collaboration experiences and performance. These findings highlight the importance of considering individual characteristics in the design of AR tools to optimize user well-being and performance outcomes.
Conclusion: Our study provides a foundational framework for understanding the interplay between individuals, tasks, and technology, underscoring the need for AR tools that align with user characteristics. It also lays the groundwork for future IMO research in AR-mediated remote collaboration, contributing to the development of more effective and health-promoting AR technologies.
Less.Lisa Thomaschewski, ... Annette Kluge
DOI:https://doi.org/10.70401/ec.2025.0007 - April 16, 2025