AAU logo

Current PhD Projects

Daniel Michelsanti
Title: Audio-Visual Speech Enhancement for Hearing Assistive Devices

Supervisors: Zheng-Hua Tan, Sigurdur Sigurdsson (Oticon), Jesper Jensen (Oticon A/S, AAU).
Start date: 1/9-2017.
Expected end date: 31/8-2020.

The presence of background noise can be very disturbing for hearing impaired people, because it may limit the communication quality and influence their social life. The task of reducing this noise by improving the quality and the intelligibility of the target speech is known as speech enhancement. Current state-of-the-art systems tackle this problem by processing only the audio signals. However, speech enhancement systems might benefit from the integration of visual cues, e.g. facial expressions of the target speaker. The objective of this project is to investigate deep-learning-based audio-visual speech enhancement, with the final goal of developing better algorithms to be adopted in hearing assistive devices.


Poul Hoang
Title: User-Symbiotic Speech Enhancement for Hearing Aid Systems

Supervisors: Zheng-Hua Tan, Jan Mark de Haan (Oticon A/S), Thomas Lunner (Oticon A/S), Jesper Jensen (Oticon A/S, AAU).
Start date: 15 August, 2018.
Expected end date: 14 August, 2021.

Hearing impaired often have reduced ability to understand speech due to their hearing loss. To help increase speech intelligibility and listening comfort, modern hearing aids typically apply noise reduction algorithms to reduce undesired acoustic signals from the environment. One problem faced when applying noise reduction algorithms such as beamforming, is that many of them require that the direction of the desired speaker is known. Methods that are based on parametric models have previously been proposed, but as an alternative the Ph.D. project explores the use of deep learning methods that are expected to significantly outperform model-based approaches in very noisy environments, as deep learning methods might be able to capture and exploit more details from the noisy environment. We will furthermore explore algorithm variants that work in closer symbiosis with the hearing aid user. More specifically, we believe that we can improve the noise reduction algorithms by providing them with additional information about the user collected from sensors besides microphone signals.


Miklas Strøm Kristoffersen
Title: Automated Audiovisual Inference of the Intention of Multiple Users in the Home

Supervisors: Zheng-Hua Tan, Sven Ewan Shepstone (Bang & Olufsen A/S)
Start date: 1 September, 2016.
Expected end date: 31 August, 2019.

This industrial PhD project is carried out in collaboration with Bang & Olufsen A/S. It is concerned with automatically and robustly inferring the immediate response for multiple users in a room when one person addresses a device, for example, issuing a command to a television. An important part of this process is to extract contextual settings (e.g. who, when, and where) and transfer it into personalized system responses based on among others prior knowledge of the users. Thus, the major tasks of the project are to robustly identify users, establish and maintain user preferences, and support multiple simultaneous users in order to design an experience aimed at whole groups.

Stefanos Astaras
Title: Understanding Activities by Fusing Visual Descriptors

Supervisors: Zheng-Hua Tan, Ove Kjeld Andersen, Aristodemos Pnevmatikakis (AIT)
Start date: 1 March, 2016.
Expected end date: 28 February, 2020.

Human activity recognition in video streams is about understanding the context in those streams using computer vision and classification techniques. Activity recognition includes the extraction of both static information (i.e. presence of people and objects) and dynamic information (movements, interactions). This domain has important applications in content aggregation, surveillance, remote caring solutions and entertainment. For the building blocks of our solution, we will research individual visual descriptors, initially across space and later across time. Spatial descriptors include image-related features (edges, texture) and object-related ones (size, boundaries), while temporal descriptors are the output of multi-target tracking systems. The classification confidence is built from the fusion of the extracted descriptors. As a result, we will build the signal processing and classification algorithms supporting a system that detects predefined activities in real-world setups, as well as the training process that will allow our system to be applicable to different sets of activities.