Current PhD Projects

Mojtaba Farmani
Title: Sound Scene Analysis for Hearing Aid Applications


Supervisors: Jesper Jensen, Zheng-Hua Tan and Michael Syskind Pedersen (Oticon A/S)

Start date: 01-04-2014
Expected end date: 31-03-2017

Description:
This research project focuses on the problem of statistically optimal sound scene analysis. Specifically, the problem is to optimally determine the physical location of sound sources and furthermore analyze the sound scene in terms of background noise level, reverberation level, etc. Mother Nature imposes fundamental limits on the accuracy with which this information can be extracted from time-limited, noisy microphone signals: the project is concerned with determining what these limits are and designing algorithms that approach them. The research project is a cooperation between Aalborg University, Aalborg and Oticon A/S, Copenhagen who is a world-leading hearing aid manufacturer.
 

 

Adam KuklasiƄski, Industrial PhD

Title: Multi-channel dereverberation for speech intelligibility improvement in hearing aid applications

Company: Oticon A/S

Supervisors: Jesper Jensen (Oticon A/S, AAU), Søren Holdt Jensen (AAU), Simon Doclo and Timo Gerkmann (both from University of Oldenburg)

Duration: April 1, 2013 - March 31, 2016 (expected)

Description:The objective of this PhD project is to develop and evaluate a class of scalable, robust, multi-microphone signal processing algorithms to improve speech intelligibility for hearing aid users in reverberant situations. The algorithms are scalable in that they should be applicable to any microphone array geometry and robust in that they generally improve and never deteriorate the situation for the hearing user.

This PhD project is part of the ITN-DREAMS (Dereverberation and REverberation of Audio, Music, and Speech), a research project funded by the European Commission. /www.dreams-itn.eu/

 

XIAODONG DUAN
TITLE: MULTIMODAL PERSON AND SOCIAL BEHAVIOUR RECOGNITION
 

Supervisor: Zheng-Hua Tan

Start date: January 1, 2014

Description:
During human-to-human interaction, we are able to recognize a want for communication, e.g., by receiving a vocal greeting and/or a physical gesture, e.g., waving; following which we can recognize, locate, and track the communicator using eyes and ears simultaneously. We can then respond properly, based on their actions, behaviours, and emotions. With the development of computers, cameras, and microphones, it is possible to give a social robot some of the abilities of a human, i.e., the afore-mentioned vision or fusion of vision and speech. It can then serve more naturally. To realize this, we first need to realize more robust and proper vision based algorithms for person recognition, tracking, and social behaviour recognition. Secondly, a fusion method of visual and speech, reinforcement fusion, will be studied to overcome the limitations of sole visual information in some circumstances.
 

 

Sven Ewan Shepstone, Industrial PhD
Title: Audio-based Profile Extraction to Enhance TV recommendation

Research Areas: Speaker Recognition, Pattern Recognition, Recommender Systems

Description: This industrial PhD project is carried out in collaboration with Bang & Olufsen A/S. It is about recommending TV content using profiles built from audio-derived paramters (instead of using ratings, usage patters, or explicitly provided data).  To date we have investigated both age, gender and emotions, and how these can be used to enhance recommendation. We have also investigated using emotion granularity and how this might relate to recommendation of items in a real world context. Since the accuracy of speaker recognition is of paramount importance for this project, we also focused on using i-vector modelling to enhance speaker recognition, particularly under mismatched conditions.

Supervisors: Zheng-Hua Tan (Aalborg University), Søren Holdt Jensen (Aalborg University), Thomas Fiil (Bang & Olufsen)
Date of project: 1 June 2012 - 31 May 2015
 

Asger Heidemann Andersen, Industrial PhD student with Oticon.
Title: Intelligibility prediction for hearing aid systems

Start date: 1/9-2014.
Expected end date: 31/8-2017.

Supervisors: Zheng-Hua Tan (AAU), Jesper Jensen (Oticon/AAU), Jan Mark de Haan (Oticon).

Description:
The ability to understand spoken language is one of the most important abilities of the hearing sense in humans. The degradation of this ability is therefore also among the most disabling consequences of a hearing loss. Consequently, it is an important task for a hearing aid to restore this ability as well as possible. Modern hearing aids are designed with this in mind, and their effect on speech intelligibility is evaluated on human subject. This process is expensive and time consuming.

Asger Heidemann Andersen is an industrial PhD student with Oticon, a Danish hearing aid manufacturer. His research focuses on purely computational techniques to predict the impact of a hearing aid on speech intelligibility. This can greatly benefit the design procedure of hearing aids by providing cheap and fast measures of performance with respect to speech intelligibility. Additionally, such techniques enable the possibility of designing “intelligibility aware” hearing aids, which continuously optimize the user’s ability to understand speech in changing environments.

 

NICOLAI BÆK THOMSEN
TITLE: DISTANT SPEECH RECOGNITION AND MULTIMODAL PERSON TRACKING

Supervisor: Zheng-Hua Tan
Co-supervisors: Søren Holdt Jensen og Børge Lindberg
 
Start date: September 1st, 2013
End date: August 31st, 2016

Description:
This project deals with improving how humans interact with socially intelligent service robots to achieve better service. For example speech commands should be given to the robot and it should respond in a proper way. This is practical and/or necessary in situations where a person is unable to carry out a specific task himself (e.g. disabled people). In order for this communication to be natural the robot needs to know where the person(s) is and what the person is saying. Finding out the position of the person can be done robustly by combining both audio and visual signals to overcome the problem where the quality of one of the modalities is greatly reduced. Finding out what the person is saying is difficult when the person and robot is not close to each other, resulting in reverberation and low signal quality. This project investigates robust methods to deal with these problems.