Research Areas

The current major active areas of research at SPRL are:

  1. Robust Online Visual Multi-Target Object Tracking
  2. MRI Signal Processing
  3. Audio/speech signal processing
  4. Automatic speech recognition
  5. Robust speech recognition
  6. Speech enhancement
  7. Speaker adaptation in speech recognition
  8. Speaker/language recognition
  9. Audio/speech source localization/tracking/separation
  10. Audio and speech watermarking
  11. Image processing
Online multi-object tracking aims at producing complete tracks of multiple objects using the information accumulated up to the present moment. It still remains a difficult problem in complex scenes, because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors.
Tracking-by-detection has proven to be the most successful strategy to address the task of tracking multiple targets in unconstrained scenarios

Speech recognition systems decode the speech signal into meaningful words, phrases or sentences. Robustness in speech recognition refers to the need to maintain good recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulatory, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources.

Specific activities in this area involve:

  1. Autocorrelation-based robust speech recognition
  2. Histogram equalization and other normalizations for robust recognition
  3. ICA-based robust speech recognition and enhancement
  4. MVDR and other spectral estimations for robust speech recognition

Speaker adaptation consists of adapting a (usually speaker independent) speech recognizer, to a new speaker, in order to improve its overall performance for that specific speaker.

Specific activities in this area involve:

  • Speaker clustering
  • Eigenvoice, MAP and MLLR-based adaptations
Language identification is the process of determining which natural language given content is in. Traditionally, identification of written language - as practiced, for instance, in library science - has relied on manually identifying frequent words and letters known to be characteristic of particular languages. More recently, computational approaches have been applied to the problem. By viewing language identification as a kind of text categorization, a Natural Language Processing approach, which relies on statistical methods, may be utilized.

Digital watermarking is the art of embedding useful information into the digital products (such as audio, image, video, text) in a way that does not interfere with normal usage of it. This information is used for different purposes such as copyright protection, content authentication, broadcast monitoring etc. Several issues should be considered in a watermarking system. Three important issues are: transparency of watermark, robustness of the system against attacks and data rate of watermark embedding. A good watermarking system should have all of these requirements at acceptable level.

Specific activities in this area involve

  • Robust audio watermarking
  • Using neural networks in watermarking
  • Mathematical modeling of watermarking systems