2024-I3SW Speakers

Keynote Speaker

Shuichi Sakamoto

Professor at the Research Institute of Electrical Communication, Tohoku University, Japan

“Toward 3D binaural technology compatible with human's auditory space perception”

Abstract

Advanced technologies of binaural synthesis are crucial for providing rich spatial information to users. To develop such technologies at a high level, deep knowledge of human auditory space perception as the multi-modal perceptual process needs to be applied effectively. However, it is still unclear how humans perceive and process the binaurally inputted information of auditory space. In this lecture, I first introduce the perceptual cues of binaural space, especially related to head-related transfer functions (HRTFs), which are well-known as the key components of spatial hearing. Then, based on this knowledge, binaural synthesis using HRTFs and spherical microphone arrays is introduced.

Biography

Shuichi Sakamoto received his B.S., M.Sc., and Ph.D. from Tohoku University, in 1995, 1997, and 2004, respectively. He joined the Research Institute of Electrical Communication, Tohoku University, as a Research Associate in 2000. He was then appointed an Associate Professor in 2011. He has been a Professor at the Research Institute of Electrical Communication, Tohoku University, since 2019 and was a Visiting Researcher at McGill University, Montreal, Canada, from 2007 to 2008. His research interests include the development of high-definition three-dimensional audio-recording systems and human multisensory information processing, including hearing and speech perception. He is an Editorial Board Member of Frontiers in Psychology, Auditory Perception & Cognition. He is also a Member of the Acoustical Society of America, Acoustical Society of Japan, and others.

Speakers

Sungyoung Kim

Associate Professor of Graduate School of Culture Technology, KAIST, South Korea

“Towards Individualization of Binaural Music Reproduction”

Abstract

The AIRIS laboratory (Applied and Innovative Research for Immersive Sound laboratory) conducted a comprehensive comparison of four commercial binaural renderers, assessing listeners' preferences for each. Building upon our prior research, the comparative analysis revealed distinct between-group differences that can be attributed to individual listening proficiency.

Participants with extensive backgrounds in music and audio production exhibited heightened sensitivity to subtle perceptual variations in binaural renders, underscoring their ability to discern nuanced differences. In contrast, the other group, comprising listeners with less experience in music and audio production, demonstrated a tendency to overlook minor distinctions in binaural rendering. Instead, their sensitivity was directed towards the overall direct-to-reverberation ratio. This intriguing finding suggests that introducing room-related reverberation could potentially enhance the binaural presentation of musical content for this specific group of listeners. In contrast, participants with more critical and advanced listening skills prioritized timbre-related fidelity over the precise reproduction of space-induced characteristics.

In essence, our results indicate a nuanced divergence in listener preferences and sensitivities to binaurally presented music based on their expertise in music and audio production. This insight has valuable implications for tailoring binaural rendering approaches to cater to the distinct needs and expectations of different listener groups, thereby contributing to the refinement and optimization for binaurally presented musical experiences in future.

Biography

Sungyoung Kim received a B.S. degree from Sogang University, Korea, and Master of Music and Ph.D. from McGill University, Canada.

Currently he works for Korea Advanced Institute of Science and Technology (KAIST) and Rochester Institute of Technology (RIT) as an associate professor. His research interests include enjoyable auditory environment based on a multichannel audio system, virtual reality audio, rehabilitation of listening experience, auditory training, cross-cultural comparison of listening experience and virtual acoustics. He wrote book chapters for two books on immersive sound production and reproduction: the Immersive Audio (Focal Press) and 3D Audio (Contextual). Prior to joining academia, Dr. Kim worked for Korean Broadcasting System (KBS) as a recording engineer where he produced CDs and for Yamaha Corporation as a research associate where he researched human-factors for music and listening experiences. He was a visiting research professor at Kyoto University (Japan) through the JSPS Invitational Fellowship for Research in Japan (2019). His current research interests are rendering and perceptual evaluation of spatial audio, digital preservation of aural heritage, AI for audio, and auditory training for hearing rehabilitation.

Yong-Hwa Park

Associate Professor of Department of Mechanical Engineering, KAIST, South Korea

“Human Auditory System-Inspired Sound Event Detection and Localization for Humanoid Robot Applications”

Abstract

Pursuing intelligent machines that recognize human’s condition and surrounding events autonomously, this seminar focuses on sensing and recognition of events in intelligent machine systems by means of sound. This seminar covers recent research outcomes including environment-robust acoustic event detection and source localization based on the human auditory system knowledge and the state-of-the-art deep learning algorithms as follows:

Targeting robust acoustic event detection against harsh listening condition such as reverberation, background noises, and multiple sources, we focus on sound classification and localization incorporated with specialized acoustic signal featuring, massive data augmentation, and dedicated AI algorithm suitable for non-stationary practical problems, which humanoid robots may easily encounter. For the acoustic signal featuring, in-depth functional analysis of human auditory system is carried out via time-frequency-spatial analysis to mimic the human organ’s behavior including Head Related Transfer Function (HRTF), various auditory filters and auditory cortex neural responses, e.g., Spectro-Temporal Receptive Field(STRF).

For the data collection and augmentation, massive environmental sound data were gathered and physics-based data augmentations were carried out on the top of the data collection. Specifically, a dataset of HRTF was newly constructed for the application of binaural source localization in humanoid robots. We strongly rely on the acoustic domain knowledge in the design of deep learning scheme considering non-stationary random environment which also the humanoid robot may frequently encounter. The result shows that the dedicated network scheme can outperform the state-of-the-art acoustic recognition neural networks. As outcomes, KAIST cough detection camera, Binaural Sound Event Localization and Detection (BISELD), FilterAugment Scheme, and Temporal Dynamic Convolution Network (TDY-CNN) are demonstrated commonly for the sound event detection and localization of humanoid robots.

Biography

Yong-Hwa Park received BS, MS, and PhD in Mechanical Engineering from KAIST in 1991, 1993, and 1999, respectively. In 2000-2003, he joined Department of Aerospace Sciences in University of Colorado at Boulder as a research associate working in high-fidelity simulation of dynamical systems. In 2003-2016, he worked for Samsung Electronics in Visual Display Division at Multimedia Business Department, and Samsung Advanced Institute of Technology (SAIT) as a Research Master in the field of micro-opto-electro-mechanical systems (MOEMS) with applications to 3D imaging, and health monitoring devices. From 2016, he joined to KAIST as an associate professor of NOVIC+ (Center Noise & Vibration Control Plus) at the Department of Mechanical Engineering devoting to researches on vibration, sound, 3D vision sensors (LiDAR) and Deep Learning-based condition monitoring and health sensors. Specifically, his research fields include structural vibration, condition monitoring; auditory intelligence; human sound and vibration; cardiovascular health monitoring; and 3D LiDAR sensors. He has been working for SPIE Photonics West as a conference chair of MOEMS and miniaturized systems since 2013. He is a Vice President of KSNVE and Dynamics, Control and Robot Division of KSME; and executive committee member of SPIE. He works for KSME Journal and JSMT (Springer) as an associated editor.

Jung-Woo Choi

Associate Professor of School of Electrical Engineering, KAIST, South Korea

Abstract

The recent remarkable advancements in deep neural networks have ushered in significant improvements in various spatial audio-related tasks. Examples include speech enhancement systems capable of effectively separating voices from heavily contaminated signals, even in situations with a negative signal-to-noise ratio. Additionally, neural dereverberation models have shown the capability to remove room reverberations from audio signals, even for rooms not encountered during the training process. Further pushing the boundaries, the introduction of transformers, renowned for their ability to capture contextual information and data relationships, has paved the way for even more astonishing deep neural network models.

This presentation aims to introduce three such remarkable models that were developed in the Smart Sound Systems Laboratory over the past year. The first model is an array-agnostic speech enhancement system that excels at noise and reverberation removal, regardless of the microphone array configuration. The second model focuses on target signal extraction, allowing the separation of specific types of signals from complex mixtures based on user-provided clues, such as class labels or waveform characteristics. Lastly, the Room Geometry Inference (RGI)-Net showcases its ability to visualize room shapes using multichannel room impulse responses. In addition to presenting these models, I will also delve into the potential impact of these functional advancements on the future of augmented reality.

Biography

Jung-Woo Choi received the B.Sc., M.Sc., and Ph.D. degrees in Mechanical Engineering from the Korea Institute of Science and Technology (KAIST), South Korea in 1999, 2001, and 2005, respectively. From 2006 to 2007, he was a postdoctoral researcher at the Institute of Sound and Vibration Research (ISVR) of the University of Southampton, Southampton, U.K. From 2007 to 2011, he worked with Samsung Electronics at the Samsung Advanced Institute of Technology, Suwon, South Korea. He was a Research Associate Professor in the Department of Mechanical Engineering at KAIST until 2014. In 2015, he joined the School of Electrical Engineering of the KAIST as an Assistant Professor. In 2018, he became an Associate Professor. His current research interests include array signal processing, AI models for spatial audio, and their applications. He is a member of the Acoustical Society of America at the Institute of Noise Control Engineers, USA, and the Korean Society of Noise and Vibration Engineering.