Robust Speech Recognition
[TOP]
Speech recognition systems decode
the speech signal into meaningful words, phrases or sentences.
Robustness in speech recognition refers to the need to maintain good
recognition accuracy even when the quality of the input speech is
degraded, or when the acoustical, articulatory, or phonetic
characteristics of speech in the training and testing environments
differ. Obstacles to robust recognition include acoustical
degradations produced by additive noise, the effects of linear
filtering, nonlinearities in transduction or transmission as well
as impulsive interfering sources, and diminished accuracy caused by
changes in articulation produced by the presence of high-intensity
noise sources.
Specific activities in this area involve
-
Autocorrelation-based robust speech recognition
-
Histogram
equalization and other normalizations for robust recognition
-
ICA-based robust
speech recognition and enhancement
-
MVDR and other
spectral estimations for robust speech recognition
Speaker Adaptation
[TOP]
Speaker adaptation
consists of adapting a (usually speaker independent) speech
recognizer, to a new speaker, in order to improve its overall
performance for that specific speaker.
Specific activities in this area
involve
-
Speaker clustering
-
Eigenvoice, MAP and
MLLR-based adaptations
Audio Processing and Content Analysis
[TOP]
Audio processing and content
analysis refers to the extraction of information and meaning from
audio signals for analysis, segmentation,
classification, storage, retrieval,
synthesis, etc.
Specific activities in this area include
- SVM for audio segmentation
- One-class SVM for audio classification
-
Speech/music discrimination
- Music signal processing
-
Source Separation
Voice Activity Detection
[TOP]
Voice activity detection is an
algorithm used in
speech processing wherein, the
presence or absence of human speech is detected from the
audio samples, usually in noisy environments. The main uses of VAD
are in
speech coding and
speech recognition. A VAD may not
only indicate the presence or absence of speech, but also whether
the speech is
voiced or unvoiced,
sustained or early, etc.
Specific activities in this area
involve
-
Weighted feature combination for VAD
-
Non-causal VAD
-
Likelihood
Ratio-based VAD
-
Enhanced LTSD-based
VAD
Language Identification
[TOP]Language identification is the process of
determining which
natural language given content is
in. Traditionally, identification of written language - as
practiced, for instance, in
library science - has relied on
manually identifying frequent words and letters known to be
characteristic of particular languages. More recently, computational
approaches have been applied to the problem. By viewing language
identification as a kind of
text categorization, a
Natural Language Processing
approach, which relies on
statistical methods, may be utilized.
Audio Watermarking
[TOP]
Digital watermarking is the art of embedding useful information into
the digital products (such as audio, image, video, text) in a way
that does not interfere with normal usage
of it. This information is used for different purposes such as
copyright protection, content authentication, broadcast monitoring
etc. Several issues should be considered in a watermarking
system. Three important issues are: transparency of watermark,
robustness of the system against attacks and data rate of watermark
embedding. A good watermarking system should have all of these
requirements at acceptable level.
Specific activities in this area
involve
-
Robust audio watermarking
-
Using neural networks in watermarking
-
Mathematical modeling
of watermarking systems