<2018 Korea-UK Focal Point Workshop Program>


Title: Intelligent Virtual Reality: Deep Audio-Visual Representation Learning for Multimedia Perception and Reproduction


Wed. 24 Jan.,  B201, 2nd engineering building, Yonsei Univ.


    09:30 - 09:50: Coffee and pastries
    09:50 - 09:55: Welcome and introduction by Prof. Kwanghoon Sohn
    09:55 - 10:00: Welcome by Mr. Gareth Davies (Head of Science and Innovation at British Embassy Seoul)
    10:00 - 10:50: Prof. Hong-Goo Kang – Deep learning based speech signal processing
    10:50 - 11:40: Prof. Young-Cheol Park  - Recent acoustic researches of MSP Lab in Yonsei University
    11:40 - 13:10: Lunch
    13:10 - 14:00: Invited Talk and demo (Dr. Taegyu Lee, G'Audio Lab for spatial audio  https://www.gaudiolab.com/ )
    14:00 - 14:20: Prof. Adrian Hilton – Introduction of S3A spatial audio project (Univ. Surrey)
    14:20 - 15:00: Dr. Jon Francombe – Media device orchestration for immersive spatial audio reproduction (BBC)
    15:00 - 15:10: Break
    15:10 - 15:50: Dr. Wenwu Wang – Acoustic Reflector Localisation and its Application in Blind Source Separation (Univ. Surrey)
    15:50 - 16:40: Dr. Marcos F. S. Galvez – Listener-adaptive immersive audio with soundbars (Univ. Southampton)
    16:40 - 17:00: Closing


Prof. Hong-Goo Kang


Title: Deep Learning Based Speech Signal Processing: Recent research activities of DSP Lab in Yonsei University


Abstract: Among various types of natural human-computer-interaction systems, speech is still one of the most convenient one because of the easiness of using sensors, i.e. microphones. Deep learning technologies that bring a paradigm shift in many research field also play a key role in natural speech interface area. In this talk, we introduce recent research activities of DSP lab in Yonsei university. We first present brief description of three projects such as 1) background noise removal for audio/video clips, 2) automatic speech recognition for distant speakers, and 3) speech production models for text-to-speech systems. All the projects given in this talk require in-depth knowledge of deep learning techniques as well as speech signal processing theory. We also would like to share our views on future research directions in audio-visual signal processing.


Bio: Hong-Goo Kang received the B.S., M.S., and Ph.D. degrees from Yonsei University, Korea in 1989, 1991, and 1995, respectively. From 1996 to 2002, he was a senior technical staff member at AT&T Labs-Research, Florham Park, New Jersey. He is currently a Professor at Yonsei University. He actively participated in international collaboration activities on making new speech/audio coding standard algorithms hosted by ITU-T and MPEG. He was an associate editor of the IEEE Transactions on Audio, Speech, and Language processing from 2005 to 2008. He served numerous conferences and program committees. He was a vice chair of technical program committee in INTERSPEECH2004 held in Jeju island, Korea. In 2008~2009 and 2015~2016, respectively, he worked for Broadcom (Irvine, CA) and Google (Mountain View, CA) as a visiting scholar, where he participated in various projects on speech signal processing. His research interests include speech/audio signal processing, machine learning, and human computer interface.


Prof. Young-cheol Park


Title: Recent Acoustic Researches of MSP Lab in Yonsei University


Abstract: We introduce couple of acoustic researches conducted last year. The first topic is about a steerable differential microphone array (DMA) which has advantages over the conventional microphone array because it can effectively reduce the influence of interference and background noise in compact size. We introduce the process of developing an algorithm that can steer 2nd-order DMA to any direction without losing the beam-shape. The second topic is about the parametric array which has been studied widely in the context of underwater sonar and in air. It exploits an effect known as self-demodulation to create extremely directive sounds. However, harmonic distortion is inevitable in the process of self-demodulation. We develop a distortion compensation technique for the SSB modulation-based parametric array, and it is applies to the construction of a parametric speaker using an array of low-power ultrasonic transducers. After the presentation, a quick demo for the parametric speaker will be provided.


Bio: Young-cheol Park received Ph.D. degree in electronic engineering from Yonsei University, Seoul, Korea, in 1993. From September 1993 to October 1995, he was a Postdoctoral Research Scholar at Pennsylvania State University, where he worked on active noise control. From March 1996 to August 1998, he was with Samsung Electronics, Korea, and from April 1999 to February 2002, he was with InTime Corporation, Korea, where he worked on digital hearing aids, high-quality digital audio processors. In 2002, he joined Division of Computer and Telecommunication Engineering of Yonsei University, where he is currently Full Processor. In 2013, he was a visiting scholar of CCRMA, Stanford University where he worked for superdirective microphone arrays. His research interests include adaptive filtering for acoustics, 3D audio signal processing, digital hearing aids, and high-quality audio coding.


Dr. Taegyu Lee 
Title: VR/360 Audio Production and Workflow


Abstract: G’Audio Lab  develops VR audio technologies, one of which was adopted as the MPEG-H 3D Audio international standard. Based on these, we provide an end-to-end solution for whole VR ecosystem as producer intended. This workshop will be discussed the basics of VR audio when creating, delivering, and playing back 360/VR contents.


Bio: Ted Lee (Taegyu Lee), CTO of G’Audio Lab, holds Ph.D. degrees in electrical and electronic engineering from  Yonsei University. Since 2013, Dr. Lee has been an active participant on the ISO/IEC MPEG standardization committee and has significantly contributed to the standardization of MPEG-H 3D Audio. Currently, he manages all technology-related issues of G'Audio Lab building spatial audio software tools that are forging the future of immersive and lifelike VR experiences.



Prof. Adrian Hilton


Title: Introduction of S3A spatial audio project


Abstract: S3A is a major five-year UK research collaboration between internationally leading experts in 3D audio and visual processing, the BBC and UK industry. The goal of S3A is to deliver a step-change in the quality of audio consumed by the general public, using novel audio-visual signal processing to enable immersive audio to work outside the research laboratory in everyone’s home. S3A aims to unlock the creative potential of 3D sound and deliver to listeners a step change in immersive experiences. To achieve this S3A brings together leading experts and their research teams at the Universities of Surrey, Salford and Southampton and the BBC Research & Development.


Bio: Prof. Adrian Hilton received the B.S.(Hons.) and D.Phil. degrees from University of Sussex, Sussex, U.K., in 1988 and 1992, respectively. He is a currently Director of the Centre for Vision, Speech, and Signal Processing at the University of Surrey, UK. His research interests include robust computer vision to model and understand real world scenes which is internationally recognised with over 200 published articles receiving six best paper awards, two EU IST Innovation Prizes and a Manufacturing Industry Achievement Award. He is currently the PI of EPSRC project S3A: Future Spatial Audio to bridge the gap between video/audio processing. 



Dr. Jon Francombe


Title: Media device orchestration for immersive spatial audio reproduction


Abstract: It is possible to create exciting, immersive spatial audio listening experiences in the lab or recording studio, where a large number of high quality loudspeakers can be carefully positioned and calibrated. Such systems are generally not installed in living rooms, so it is challenging to recreate the immersive listening experience in consumers’ homes. However, it is likely that there will in fact be multiple loudspeakers in a living room; for example, devices such as Bluetooth loudspeakers, televisions, laptops, mobile phones, tablets, and so on. "Media device orchestration" (MDO) is the concept of utilising all available devices to augment the reproduction of a media experience. In the S3A project, we are investigating media device orchestration by performing qualitative and quantitative listening tests; developing demonstration systems; and commissioning new content.


Bio: Jon graduated from the University of Surrey in 2010 with a First Class Honours degree in Music and Sound Recording. He returned to Surrey in 2011 to study for a Ph.D. in perceptual audio quality evaluation as part of the Perceptually Optimized Sound Zones (POSZ) project (www.posz.org), and then worked as a research fellow on the S3A: Future Spatial Audio project (www.s3a-spatialaudio.org). Jon currently works as a senior research and development engineer in the audio research and development team at the BBC. His research focuses on audio perception, quality evaluation using quantitative and qualitative methods, and new methods of spatial audio reproduction.


Dr. Wenwu Wang


Title: Acoustic Reflector Localisation and its Application in Blind Source Separation


Abstract: Sound is classically defined by wave functions. During its propagation, sound interacts with the obstacles it encounters. These interactions result in interference with the main signal that can be defined as either being constructive or destructive. In the signal processing research field, it is important to identify these interactions, in order to either exploit or avoid them. In this talk, we focus on acoustic reflector localization and its use in blind source separation. First, we present four novel methods for acoustic reflector localization. Second, we present a modified version of a state-of-the-art source separation method, by exploiting both direct sound and first strong reflection information to model binaural cues. Experiments were performed on real datasets to show the improved performance of the acousticreflector localization and the modified binaural source separation algorithms as compared with several baseline methods.


Bio: Dr. Wenwu Wang is a Reader in Signal Processing and Co-Director of the Machine Audition Lab at the Centre for Vision Speech and Signal Processing, University of Surrey, where he joined since May 2007. He is currently a member of the MoD UDRC in Signal Processing (2009-), the BBC Audio Research Partnership (2011-), the MRC/EPSRC Microphone Network (2015-), and the BBC Data Science Research Partnership (2017-), as well as an associate member of Surrey Centre for Cyber Security (2014-). His current research interests include blind signal processing, sparse signal processing, audio-visual signal processing, machine learning and perception, and machine audition (listening). He has authored/co-authored more than 180 publications. He is currently an Associate Editor for IEEE Transactions on Signal Processing (since 2014).


Dr. Marcos Simón Galvez


Title: Listener-adaptive immersive audio with soundbars


Abstract: Loudspeaker arrays allow for accurately controlling a soundfield. One of their uses is to reproduce binaural audio by controlling the radiated pressure at the input of the ears of one or various listeners, which is also known as Transaural audio. Transaural audio works by creating “virtual headphones” at the position of the listeners’ ears, which makes the technique quite dependent on the listener´s position. In order to avoid this, a computer vision system can be used to modify the output of the Transaural control algorithm so that the virtual headphones are always locked according to the listener´s position. This technology has been incorporated in one of the latest soundbars developed inside the S3A project, which allows for single-listener adaptive Transaural reproduction. The soundbar will be demonstrated as an alternative to binaural reproduction through headphones.


Bio: Dr. Marcos Simón graduated in 2010 from the Technical University of Madrid with a BSc in telecommunications. In 2011 he joined the Institute of Sound and Vibration Research, where he has worked in personal audio with loudspeaker arrays for improving speech intelligibility in hard of hearing people and also in the modelling of cochlear mechanics. He obtained his PhD title in 2014, and is currently working on the S3A spatial audio program. In 2013 he was awarded with the IOA young persons’ award for innovation in acoustical engineering and the Sociedad Española de Acústica (Spanish Acoustical Society) Andrés Lara prize for young scientists. In 2017 Marcos Simón founded Soton Audio Labs Limited together with Filippo Fazi, a company to commercialise loudspeaker array technologies for future immersive audio systems. In 2017 Marcos Simón co-founded Soton Audio Labs Limited to commercialise the array technology coming from the S3A project.