Table of Contents
Reframing human–robot interaction through extended reality: Unlocking safer, smarter, and more empathic interactions with virtual robots and foundation models
This perspective reframes human–robot interaction (HRI) through extended reality (XR), arguing that virtual robots powered by large foundation models (FMs) can serve as cognitively grounded, empathic agents. Unlike physical robots, XR-native agents ...
More.This perspective reframes human–robot interaction (HRI) through extended reality (XR), arguing that virtual robots powered by large foundation models (FMs) can serve as cognitively grounded, empathic agents. Unlike physical robots, XR-native agents are unbound by hardware constraints and can be instantiated, adapted, and scaled on demand, while still affording embodiment and co-presence. We synthesize work across XR, HRI, and cognitive AI to show how such agents can support safety-critical scenarios, socially and cognitively empathic interaction across domains, and extend physical capabilities with XR and AI integration. We then discuss how multimodal large FMs (e.g., large language models, large vision models, and vision-language models) enable context-aware reasoning, affect-sensitive situations, and long-term adaptation, positioning virtual robots as cognitive and empathic mediators rather than mere simulation assets. At the same time, we highlight challenges and potential risks, including overtrust, cultural and representational bias, privacy concerns around biometric sensing, and data governance and transparency. The paper concludes by outlining a research agenda for human-centered, ethically grounded XR agents, emphasizing multi-layered evaluation frameworks, multi-user ecosystems, mixed virtual–physical embodiment, and societal and ethical design practices to envision XR-based virtual agents powered by FMs as reshaping future HRI into a more efficient and adaptive paradigm.
Less.Yuchong Zhang, ... Danica Kragic
DOI:https://doi.org/10.70401/ec.2026.0018 - February 13, 2026
Virtual humans’ facial expressions, gestures, and voices impact user empathy
Aims: Users’ empathy towards artificial agents can be influenced by the agent’s expression of emotion. To date, most studies have used a Wizard of Oz design or manually programmed agents’ expressions. This study investigated whether autonomously ...
More.Aims: Users’ empathy towards artificial agents can be influenced by the agent’s expression of emotion. To date, most studies have used a Wizard of Oz design or manually programmed agents’ expressions. This study investigated whether autonomously animated emotional expression and neural voices could increase user empathy towards a Virtual Human.
Methods: 158 adults participated in an online experiment, where they watched videos of six emotional stories generated by ChatGPT. For each story, participants were randomly assigned to a virtual human (VH) called Carina, telling the story with either (1) autonomous expressive or non-expressive animation, and (2) a neural or standard text-to-speech (TTS) voice. After each story, participants rated how well the animation and voice matched the story, and their cognitive, affective, and subjective empathy towards Carina were evaluated. Qualitative data were collected on how well participants thought Carina expressed emotion.
Results: Autonomous emotional expression enhanced the alignment between the animation and voice, and improved subjective, cognitive, and affective empathy. The standard voice was rated as matching the fear and sad stories better, the sad animation was better, creating greater subjective and cognitive empathy for the sad story. Trait empathy and ratings of how well the animation and voice matched the story predicted subjective empathy. Qualitative analysis revealed that the animation conveyed emotions more effectively than the voice, and emotional expression was associated with increased empathy.
Conclusion: Autonomous emotional animation of VHs can improve empathy towards AI-generated stories. Further research is needed on voices that can dynamically change to express different emotions.
Less.Elizabeth Broadbent, ... Mark Sagar
DOI:https://doi.org/10.70401/ec.2026.0017 - January 26, 2026