2026 ASEE Annual Conference & Exposition

Work in Progress: Beyond Words – A Multimodal AI Coaching Tool to Enhance Communication Skill Development in Engineering Education

Presented at Computers in Education (CoED): Poster Session - Division Special Events (1 of 4) -- M208

Effective communication combines verbal and non-verbal elements, including tone of voice,
facial expressions, gestures, and body language. These non-verbal cues provide essential
context, enhance clarity, and foster stronger interpersonal connections. Research demonstrates
that in-person interactions, where visual and physical cues are present, often yield more
detailed and emotionally resonant exchanges compared to audio-only communication. Despite
the growing recognition of these skills, recent societal trends such as increased online
interaction and reduced face-to-face engagement are limiting opportunities for students and
professionals to develop them naturally. Meanwhile, professional coaching services for
interviews, presentations, and public speaking are in increasing demand, yet remain costly and
inaccessible for many learners.

Existing AI-powered coaching tools, such as Microsoft Speaker Coach, Orai, and Yoodli,
provide accessible audio-based feedback but lack robust multimodal evaluation capabilities.
These platforms often focus narrowly on vocal performance, overlooking critical visual and
contextual aspects such as facial expressiveness, posture, and scenario-specific delivery. While
advanced components exist, such as OpenAI’s Whisper for speech recognition and MediaPipe
for facial landmark detection, there has been limited integration of these technologies into a
unified, context-aware educational tool.

This project addresses these gaps by developing an AI-powered communication coaching
assistant that integrates video, audio, and spoken content analysis to deliver comprehensive,
context-aware feedback. By combining speech recognition, computer vision, and natural
language processing, the system provides context-based personalized guidance on vocal
delivery, facial expression, body language and speech coherence. Its contextual awareness
allows for adaptive feedback for specific scenarios, such as technical interviews or classroom
presentations, making it directly relevant to engineering education.

To validate effectiveness, the system was benchmarked against an interview dataset and
scoring framework developed through prior peer-reviewed research in communication training,
providing academic validity to its evaluation methods. A post-study survey with student users
further revealed that 80% reported an improvement in confidence in interview preparation after
using the platform. These results demonstrate both technical robustness and measurable
impact on learners’ communication skills.

The intended contribution is a holistic, accessible tool that facilitates high-quality communication
training for engineering students, job seekers, and professionals who may lack access to in-
person mentors or costly coaching. The paper will present the system’s technical design,
multimodal AI integration, benchmarking process, and user evaluation results, while also
exploring its pedagogical implications for enhancing communication training in classrooms,
professional preparation, and lifelong learning.

Authors
  1. Jeslyn Wang University of Toronto [biography]
  2. Eren Cimentepe University of Toronto
  3. Guang Yang University of Toronto [biography]
Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026