Home
 
  Research
 
  Publications
 
  People
 
  Links
   
 
 Research Area


The current major active areas of research at SPRL are:  

  1. Robust Speech recognition
  2. Audio Processing and Content Analysis
  3. Voice Activity Detection
  4. Speaker Adaptation
  5. Speaker Segmentation and Diarization
  6. Language Identification
  7. Audio Watermarking
  8. Speech Enhancement
Robust Speech Recognition                                                                            [TOP]

Speech recognition systems decode the speech signal into meaningful words, phrases or sentences. Robustness in speech recognition refers to the need to maintain good recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulatory, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources.

 Specific activities in this area involve

  1. Autocorrelation-based robust speech recognition
  2. Histogram equalization and other normalizations for robust recognition
  3. ICA-based robust speech recognition and enhancement
  4. MVDR and other spectral estimations for robust speech recognition
Speaker Adaptation                                                                                              [TOP]

Speaker adaptation consists of adapting a (usually speaker independent) speech recognizer, to a new speaker, in order to improve its overall performance for that specific speaker.

 Specific activities in this area involve

  1. Speaker clustering
  2. Eigenvoice, MAP and MLLR-based adaptations
Audio Processing and Content Analysis                                                [TOP]

Audio processing and content analysis refers to the extraction of information and meaning from audio signals for analysis, segmentation, classification, storage, retrieval, synthesis, etc.

 Specific activities in this area include

  1. SVM for audio segmentation 
  2. One-class SVM for audio classification
  3. Speech/music discrimination
  4. Music signal processing
  5. Source Separation
Voice Activity Detection                                                                                     [TOP]

Voice activity detection is an algorithm used in speech processing wherein, the presence or absence of human speech is detected from the audio samples, usually in noisy environments. The main uses of VAD are in speech coding and speech recognition. A VAD may not only indicate the presence or absence of speech, but also whether the speech is voiced or unvoiced, sustained or early, etc.

 Specific activities in this area involve

  1. Weighted feature combination for VAD
  2. Non-causal VAD
  3. Likelihood Ratio-based VAD
  4. Enhanced LTSD-based VAD
Language Identification                                                                                      [TOP]

Language identification is the process of determining which natural language given content is in. Traditionally, identification of written language - as practiced, for instance, in library science - has relied on manually identifying frequent words and letters known to be characteristic of particular languages. More recently, computational approaches have been applied to the problem. By viewing language identification as a kind of text categorization, a Natural Language Processing approach, which relies on statistical methods, may be utilized.

Audio Watermarking                                                                                             [TOP]

Digital watermarking is the art of embedding useful information into the digital products (such as audio, image, video, text) in a way that does not interfere with normal usage of it. This information is used for different purposes such as copyright protection, content authentication, broadcast monitoring etc. Several issues should be considered in a watermarking system. Three important issues are: transparency of watermark, robustness of the system against attacks and data rate of watermark embedding. A good watermarking system should have all of these requirements at acceptable level.

 Specific activities in this area involve

  1. Robust audio watermarking
  2. Using neural networks in watermarking
  3. Mathematical modeling of watermarking systems 

   

  Speech Processing Research Lab., 3rd floor Abou-Ray-Han Building, Amirkabir University of Technology (Tehran Polytechnic), Hafez Ave., Tehran, Iran.   Tel: +98-21-6454-3392

Last Updated: 29 November 2007