INTERSPEECH 2006

Tutorials

Tutorials will take place on Sunday September 17.
Morning tutorials will be held from 9:00 – 12:30. Afternoon tutorials will be held from 13:30 – 17:00.

See the technical program for more details.

AM 1: Microphone Array Processing and Source Separation for Speech Enhancement and Recognition

Bhiksha Raj, Paris Smaragdis (MERL) and Michael L. Seltzer (Microsoft)

This tutorial will address the topics of microphone array processing for signal enhancement and multi- and single-channel approaches to signal separation. We will present a range of topics including:

1) Basic principles of microphone array processing, including issues of array geometry, spatial aliasing and the distinction between nearfield and farfield array responses, and source localization techniques.

2) Beamforming algorithms for enhancing noisy signals or separating concurrent signals, including classical methods for fixed and adaptive beamforming, maximum-likelihood methods for beamforming.

3) Independent component analysis (ICA) in time and frequency domains, and multiple-microphone methods for separating multiple concurrent signals when the number of sound sources exceeds the number of microphones.

4) Single-channel source separation techniques including those based on estimation of spectrographic masks, non-negative matrix factorization and related techniques.

5) Automatic speech recognition in noisy and reverberant environments using microphone arrays, including the effectiveness of various array processing algorithms, feature compensation algorithms, and back-end model adaptation schemes.

AM 2: Speech Under Stress

John Hansen (University of Texas, Dallas)

The field of speech processing involves the modeling of human speech production in order to formulate algorithms for effective systems in application areas such speech recognition, speaker recognition, speech synthesis, speech coding, and speech enhancement. The variability brought on by stress adversely impacts the performance of these algorithms. Stress in this context includes cognitive stress, physical stress, emotional stress (such as fear, anger, or anxiety), and stress due to the presence of noise (known as the Lombard effect).

This tutorial will include the following topics: a historical overview of speech under stress, available corpora of speech under stress, analysis of speech under stress, speech recognition under stress, detection of stress in speech, synthesis and perception of speech under stress.

AM 3: Speech and Language Processing Over the WWW

Mazin Gilbert (ATT)

This tutorial will provide an overview of the impact of the internet revolution on speech and language processing technologies. In particular, the tutorial will address the following areas:

1) Advances in Speech and Language Technology: Theoretical and practical perspective of the speech and language technologies including speech synthesis and recognition, speaker recognition, language understanding, question/answering, natural language processing, semantic classification, and machine translation.

2) The Web Transformation: The web transformation in areas of communication and information mining including document analysis and understanding, webpage analysis, information search and retrieval, trend analysis and tracking, and Services-over IP,

3) Emergence of a New Era: Technical challenges and lucrative business opportunities that are being created by the web revolution. The emergence of new applications and services for multimodal and multimedia information mining, web-based interactive virtual agents, and surveillance and intelligence gathering from blogs and web multimedia contents.

PM 1: A survey of robust speech recognition techniques

Jasha Droppo (Microsoft)

The course consists of three complementary sections: an analysis of the problem, an overview of simple solutions, and a study of active research areas.

The first part of the tutorial is dedicated to analyzing the types of noises characteristic of deployed systems, and how they affect the acoustic features.

The second part of the tutorial presents some simple, well-proven techniques that achieve noise robustness. These techniques are suitable for either deployed systems or strong noise-robust research baselines.

The final, and longest, part of the tutorial covers the details and relationships among recent work in the field, including auditory scene analysis and data-driven probabilistic approaches, feature-based and model-based techniques, feature enhancement and feature normalization, and maximum likelihood and discriminative training criteria.

PM 2: Nonspectral Features for Speech Processing

B. Yegnanarayana (IIT Madras)

For most speech applications, the speech information is captured through short-time (10-30ms) spectral features.

But the speech signal contains significant information in various other components, such as the phase of the short-time Fourier transform, the subsegmental (1-3ms) features, and the suprasegmental features.

Several linear and nonlinear methods will be discussed to extract the information present in these components.

The importance of these nonspectral features will be demonstrated in applications such as speaker verification, speech enhancement, speech synthesis, speech recognition, and language identification.

PM 3: Music Information Retrieval

Thomas Kemp and Jana Eggink (Sony Germany)

In this tutorial, we will give a comprehensive overview over the research areas in MIR, explain the fundamental algorithms that have been successfully employed, and point you to the relevant authors and publications.

Topics that will be covered by the tutorial range from early Query by Humming systems over Music similarity computation, Music Mood Detection and genre recognition to Music Identification (fingerprinting).

Different aspects of music transcription will be covered, including beat tracking, instrument identification and melody extraction. The technology behind automatic generation of individualized play lists and of personalized recommendations for music will be explained.

One additional focus point of the tutorial will be music identification. Systems such as mobile phone music identification services will be explained in detail, and compared to well-known services as CD identification.