SPRL

The current major active areas of research at SPRL are:

Robust Online Visual Multi-Target Object Tracking
MRI Signal Processing
Audio/speech signal processing
Automatic speech recognition
Robust speech recognition
Speech enhancement
Speaker adaptation in speech recognition
Speaker/language recognition
Audio/speech source localization/tracking/separation
Audio and speech watermarking
Image processing

Robust Online Visual Multi-Target Object Tracking

Online multi-object tracking aims at producing complete tracks of multiple objects using the information accumulated up to the present moment. It still remains a difficult problem in complex scenes, because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors.
Tracking-by-detection has proven to be the most successful strategy to address the task of tracking multiple targets in unconstrained scenarios

MRI Signal Processing

Audio/speech signal processing

Automatic speech recognition

Robust Speech Recognition

Speech recognition systems decode the speech signal into meaningful words, phrases or sentences. Robustness in speech recognition refers to the need to maintain good recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulatory, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources.

Specific activities in this area involve:

Autocorrelation-based robust speech recognition
Histogram equalization and other normalizations for robust recognition
ICA-based robust speech recognition and enhancement
MVDR and other spectral estimations for robust speech recognition

Speech enhancement

Speaker adaptation in speech recognition

Speaker adaptation consists of adapting a (usually speaker independent) speech recognizer, to a new speaker, in order to improve its overall performance for that specific speaker.

Specific activities in this area involve:

Speaker clustering
Eigenvoice, MAP and MLLR-based adaptations

Speaker/language recognition

Language identification is the process of determining which natural language given content is in. Traditionally, identification of written language - as practiced, for instance, in library science - has relied on manually identifying frequent words and letters known to be characteristic of particular languages. More recently, computational approaches have been applied to the problem. By viewing language identification as a kind of text categorization, a Natural Language Processing approach, which relies on statistical methods, may be utilized.

Audio/speech source localization/tracking/separation

Audio and speech watermarking

Digital watermarking is the art of embedding useful information into the digital products (such as audio, image, video, text) in a way that does not interfere with normal usage of it. This information is used for different purposes such as copyright protection, content authentication, broadcast monitoring etc. Several issues should be considered in a watermarking system. Three important issues are: transparency of watermark, robustness of the system against attacks and data rate of watermark embedding. A good watermarking system should have all of these requirements at acceptable level.

Specific activities in this area involve

Robust audio watermarking
Using neural networks in watermarking
Mathematical modeling of watermarking systems

Research Areas