1. Generative AI and Multimodal Large Language Models (MLLMs) for Smart Multimedia

Organizers: Your Name, Your Institution, Your Email

The rapid advancements in large language models (LLMs) and multimodal AI have transformed the landscape of smart multimedia. These technologies enable seamless integration of text, vision, and audio, powering intelligent content creation, adaptive learning, and interactive applications. From personalized content recommendation to generative AI in AR/VR, the possibilities are vast. This Special Session aims to explore the latest breakthroughs, challenges, and real-world applications of multimodal AI in smart multimedia.

Topics include, but are not limited to:

  • Multimodal LLMs for adaptive and context-aware media processing
  • Generative AI for personalized content recommendation and creation
  • AI-powered interactive and immersive media (AR/VR/metaverse)
  • Vision-language models for automated video editing and storytelling
  • Ethics, bias, and explainability in AI-generated multimedia

 

2. Advanced Signal Processing and AI in Smart Multimedia

Organizers: Shayok Chakraborty, Florida State University, schakraborty2@fsu.edu

Organizers: Troy McDaniel, Arizona State University, troy.mcdaniel@asu.edu

Recent advancements in signal processing and AI have significantly improved multimedia processing, enabling high-quality, efficient, and real-time media transformations. With deep learning-based techniques and edge AI, real-time applications in video, audio, and image processing are becoming more intelligent and adaptive. This Special Session aims to bring together researchers and practitioners to discuss innovations in neural and adaptive signal processing for multimedia.

Topics include, but are not limited to:

  • Neural and adaptive signal processing for real-time multimedia enhancement
  • Self-supervised and few-shot learning for multimedia signals
  • Edge AI and efficient neural networks for smart multimedia processing
  • Compressive sensing and sparse representation in multimedia
  • AI-driven noise reduction and super-resolution techniques

 

3. Smart Multimedia for Healthcare and Biomedical Applications

Organizers: Troy McDaniel, Arizona State University, troy.mcdaniel@asu.edu
Organizers: Yuichi Kurita, Hiroshima University, ykurita@hiroshima-u.ac.jp

The integration of AI with multimedia technologies in healthcare is revolutionizing diagnostics, patient monitoring, and assistive technologies. Smart multimedia applications powered by deep learning, signal processing, and multimodal data fusion can enhance medical imaging, speech-based diagnostics, and real-time patient analytics. This Special Session aims to explore the latest advancements in AI-driven healthcare multimedia applications.

Topics include, but are not limited to:

  • AI-powered medical image and video analysis for diagnostics
  • Multimodal AI for patient monitoring and assistive technologies
  • Speech and NLP technologies for healthcare applications
  • Wearable AI and smart multimedia for remote healthcare
  • Privacy and security in AI-driven healthcare multimedia

 

4. Robotics, Automation, and Smart Multimedia

Organizers: Arnaud LELEVÉ, INSA Lyon, arnaud.leleve@insa-lyon.fr

Organizers: Troy McDaniel, Arizona State University, troy.mcdaniel@asu.edu

Robotic systems increasingly rely on AI-driven multimedia processing to enhance perception, decision-making, and human interaction. With advancements in vision-based AI, multimodal learning, and real-time analytics, robots can achieve improved autonomy and collaboration. This Special Session focuses on the intersection of robotics and smart multimedia, addressing key challenges and innovations.

Topics include, but are not limited to:

  • Vision-based AI for robotics and autonomous systems
  • LLM-driven multimodal interfaces for human-robot collaboration
  • AI-driven multimedia perception for robotic surgery and telemedicine
  • Gesture and speech recognition for intuitive human-robot interaction
  • AI-powered scene understanding for autonomous navigation

 

5. Next-Gen Media Understanding, Security, and Ethics

Organizers: Your Name, Your Institution, Your Email

As AI-driven multimedia technology evolves, challenges in security, deepfake detection, and ethical AI usage become increasingly critical. Ensuring trustworthy AI models, detecting fake content, and mitigating biases are essential for responsible media applications. This Special Session explores the latest research in media security, ethics, and trustworthy AI.

Topics include, but are not limited to:

  • AI-powered video understanding and compression
  • Multimodal sentiment and emotion analysis in smart media
  • Deepfake detection and media forensics
  • Explainable AI and bias mitigation in multimedia applications
  • Secure AI-driven content generation and authentication

 

6. Smart Multimedia in the Natural Environment

Organizers: Jun Zhou, Griffith University, jun.zhou@griffith.edu.au

Topics include, but are not limited to:

  • Multi-Sensor Data Fusion for Wildlife Monitoring: The use of diverse sensor technologies has become crucial for monitoring wildlife. These sensors include acoustic, visual, LiDAR, IoT-based systems, and various camera types capturing images at different resolutions and wavelengths. These data are collected across different environments: terrestrial, aquatic, and aerial. We invite submissions that present multi-modal datasets collected in natural habitats and innovative methods for fusing these data to enhance wildlife detection, classification, monitoring, and population analysis.
  • Handling Noisy Data in Challenging Environments: Data captured in the natural environment are often subject to various types of noise. For example, nighttime images may suffer from poor lighting, while images taken in fog, rain, or snow require significant processing to maintain usability. We welcome novel methods for effectively handling and improving the quality of noisy or degraded data in these challenging conditions, ensuring the data is useful for real-world applications.
  • Multimedia Approaches for Biodiversity Analysis:
    Biodiversity analysis faces the challenge of high species diversity combined with a lack of sufficiently labeled data. Solutions such as few-shot learning, semi-supervised learning, and vision-language models offer promising avenues for overcoming these barriers. We seek novel contributions that apply multimedia techniques to address these challenges, with a particular focus on improving species identification, classification, and monitoring in environments with limited labeled data.

 

7. Haptic

Organizers: Arnaud LELEVÉ, INSA Lyon, arnaud.leleve@insa-lyon.fr

Organizers: Sylvain, CEA-List Paris, sylvain.bouchigny@cea.fr

Organizers: Carlos, Carleton University, rossa@sce.carleton.ca

Many professions rely on complex dexterous manipulation and require initial and on-going hands-on training. For example, in the medical field, simulators such as animals, cadavers, or phantoms have been a convenient way to hone surgical skills for decades. Yet, these training supporting resources are expensive, not always available, may raise ethical concerns, and provide a limited set of cases to practice on. These challenges restrict trainees' opportunities for hands-on training in their curriculum. To address this issue, cost-efficient solutions must be developed to enable hands-on practice on any case study, at any time, and as often as needed.

For more than a decade, Virtual Reality (VR) simulators have been used to overcome the aforementioned limitations. VR systems offer an infinite set of case studies and can adapt the difficulty level of the simulation on the fly to help trainees break through specific learning curves. VR simulators have been progressively improved to provide trainees with a more realistic environment in 2D and more recently in 3D. Recently, the addition of haptic/force feedback provides improved realistic interaction with the VR environment, which is known to improve skills development and retention in complex medical procedures. These simulators often require the development of accurate models to simulate the behaviour of organs interacting with each other and with surgical tools in real-time. Another example of a VR trainer is a flight simulator, that nowadays is a necessary intermediate step before training on real planes. These simulators can objectively assess the trainee’s performance during challenging simulations in a risk-free environment. 

This special session aims to provide a forum for researchers and developers in the multimedia community to present novel and original research in providing effective haptic feedback in the context of medical training simulators. The topics include but are not limited to:

  • Haptic rendering
  • Computer graphics
  • Virtual/augmented/mixed reality,
  • Variable Stiffness Actuators
  • Multimodal simulation
  • Training simulation
  • Motion capture/analysis, cognitive performance

 

Sylvain Bouchigny is a researcher in Human-Computer Interaction at the CEA LIST Institute, France. Trained in Physics, he received an M. Eng. in scientific instrumentation from the National Engineering School in Caen in 1998 and a Ph.D. in nuclear physics from the University of Paris Sud 11 (Orsay) in 2004. However, in 2007, he joined CEA LIST to work on physics applied to human interaction which was closer to his interest. His research focuses on Multimodal Human-Computer Interaction, haptics, and virtual environments applied to education, training, and rehabilitation. He conducted projects on tangible interactions on interactive tables for education and post-stroke rehabilitation and, for the last ten years, leads the development of a VR haptic platform for surgical education.

Arnaud Lelevé has been a professor at INSA Lyon since 2001. He received his PhD in Robotics in 2000 from Université de Montpellier, France. He first worked in a Computer Science laboratory on Remote-Lab systems and then joined Ampère lab in 2011 in the Robotics team. He has conducted numerous R&D projects including INTELO (mobile robot for bridge inspection) and Greenshield project (which aims at replacing pesticides by farming robots in the crops), and medical-robotics-based research projects such as SoHappy (pneumatic master for tele-echography). He has also participated in the development of hands-on training projects such as SAGA (birth simulator) or PeriSim (Epidural needle insertion simulator). He has strong skills in applied mechatronics, real-time computer science, and a good experience in scientific program management

Carlos Rossa is an Associate Professor in the Department of Systems and Computer Engineering at Carleton University in Ottawa, Canada. He received his BEng and MSc degrees in Mechanical Engineering from the Ecole Nationale d'Ingénieurs de Metz, Metz, France, both in 2010, and earned his PhD degree in mechatronics and robotics from the Sorbonne Université (former UPMC), Paris, France, in 2014 under the auspices of the Commissariat à l'Energie Atomique (CEA). His research interests include haptics and haptic devices, surgical simulation and training, biomedical instrumentation, medical robotics, and image-guided percutaneous surgery.