INTERSPEECH 2006

Plenary Sessions

Plenary 1:
ISCA Medalist
What a Study of Sound Can Tell Us About Human Speech Perception
(and Suggest About Machine Recognition)
John J. Ohala, University of California, Berkeley

Allegheny Ballrooms, Monday, September 18, 9:00 to 10:00 am

Overview

Sound change - variation in pronunciation over time, community, or the context in which speech sounds occur - has been known since ancient times as witnessed by writers noticing "accents" of people in various regions or noticing that ancient poetry meant to rhyme or have some other fixed prosodic structure has lost those aspects over the centuries. Documentation of sound change advanced rapidly in the 18th and 19th centuries, especially with the development of the comparative method, a quasi-rigorous way of establishing cognate sets of words in different languages whose different pronunciations could be explained by the systematic change of words' constituent sounds derived from some to-be-reconstructed parent language (e.g., English father, French pere, Spanish padre, Sanskrit pitar). There is today a vast archive of documented sound changes in scores of language families involving hundreds of languages. This data can be mined for insight into the causes or mechanisms of sound change, especially when combined with studies of speech production and speech perception. The account that I will present of sound change reduces to the following two principles, each of them empirically supported: 1) there is variation and ambiguity in pronunciation, some of it caused by constraints of the speech production mechanism and some of it due to the fact that the mapping between some speech articulations and speech acoustics is a many-to-few mapping, 2) such variation or ambiguity occasionally causes a listener to misinterpret how a word is to be pronounced, such that when that listener in turn speaks, the result is a different pronunciation. I will suggest that this scheme allows us to gain some insight into the basic distinctive features used in speech perception as well as their temporal domain and that this may allow us to improve on attempts to recognize speech by machines.

Presenter

John Ohala is Professor Emeritus in the Department of Linguistics, University of California, Berkeley (UCB). He was also head of the Phonology Laboratory at UCB from 1975 until his retirement in 2004. He received his PhD in Linguistics from UCLA in 1969. He has had post-doctoral positions and other visiting research or teaching positions at the Research Institute of Logopedics and Phoniatrics, (University of Tokyo), Bell Laboratories, University of Copenhagen, and University of Alberta (Edmonton) and the City University of Hong Kong. He was chair of the 2nd ICSLP (1992) in Banff, Canada, and of the 14th International Congress of Phonetic Sciences in San Francisco (1999). He serves or has served on the editorial boards of several speech- and language-related journals, including Phonetica, Journal of Phonetics, Language & Speech, Language, and Speech Communication. He was section editor for Speech Science and Phonetics for the Pergamon Encyclopedia of Language and Linguistics, 1994. His research interests are in experimental phonology and phonetics and ethological aspects of communication, including speech perception, sound change, phonetic and phonological universals, psycholinguistic studies in phonology, and sound symbolism.

Plenary 2:
Creating Speech Interfaces for Mass Market Applications
Michael Phillips, Mobeus Corporation

Allegheny Ballrooms, Tuesday, September 19, 9:00 to 10:00 am

Overview

The speech processing industry has made great progress over the last few decades. There are now mainstream markets for speech recognition and text-to-speech in call centers, mobile devices, automobiles, games, and dictation. In each of these, the technology has progressed to the point where it can provide real value for the end users and there is an active industry of companies delivering a variety of solutions to the market.

But, none of these has gotten to the point of truly ubiquitous use -- where speech interfaces are a common and expected means of interaction.

Why is this? Is it due to limitations of the technology, have we not found the right applications, or do users just not find enough value in the use of speech interfaces?

In this talk, I will discuss the current state of the speech industry in the various markets and will provide some thoughts about where speech interfaces may become ubiquitous and what we need to do to make this happen.

Presenter

Mike Phillips has been active in the speech technology world for over twenty years. In 1994, he founded SpeechWorks based on technology that he and others had developed at the Spoken Language Systems group at MIT. Over the next ten years, Mike and team grew SpeechWorks from a small startup in a new market into the market leader in the now established market for speech enabled call center solutions. In 2003, SpeechWorks was acquired by ScanSoft (now named Nuance). Mike stayed on as the CTO of ScanSoft for two years after the acquisition. After spending a year as a visiting scientist at the Spoken Language Systems group at MIT, Mike now has a new startup (called Mobeus) which is focused on multimodal interfaces for mobile devices.

Plenary 3:
Statistical language learning in human infants and adults
Elissa L. Newport, University of Rochester

Allegheny Ballrooms, Wednesday, September 20, 9:00 to 10:00 am

Overview

In collaboration with Richard Aslin, I have been developing a statistical approach to language acquisition and investigating the abilities of human learners to perform the computations that would be required for acquiring properties of natural languages by such a method. Our studies have shown that adults, infants, and even nonhuman primates are capable of performing such computations online and with remarkable speed, on both speech and nonspeech materials. Our recent work examines differences between adults, infants, and nonhuman primates in their computational capacities, how these differences may help to explain why these learners differ in their abilities to learn languages, and why languages have some of the properties they have.

Presenter

Elissa L. Newport is the department chair and the George Eastman Professor of Brain and Cognitive Sciences at the University of Rochester. Her primary research interest is in human language acquisition, with research projects including naturalistic studies of children learning their first languages, experimental studies of infants, adults, and non-human primates learning miniature languages in the lab, fieldwork on emerging sign languages, and fMRI research on language and the brain. Professor Newport received her Ph.D. in Psychology at the University of Pennsylvania and was a Sloan Fellow in Linguistics and Cognitive Science at Penn and MIT. She has been on the faculty at the University of California at San Diego, University of Illinois, and, since 1988, University of Rochester. Her research is funded by the NIH, NSF, the McDonnell Foundation, and the Packard Foundation, and for this research has received the Claude Pepper Award of Excellence from NIH. She currently is a series editor for MIT Press, serves on the Board on Behavioral, Cognitive, and Sensory Sciences of the National Academy of Sciences, and she is the Chair of Section J (Psychology) of the American Association for the Advancement of Science. She is a Fellow of the American Academy of Arts & Sciences, the Cognitive Science Society, the Society for Experimental Psychologists, and the American Association for the Advancement of Science, and is a member of the National Academy of Sciences.

Plenary 4:
Speech Recognition: The Unfinished Agenda
Raj Reddy, Carnegie Mellon University

Allegheny Ballrooms, Thursday, September 21, 9:00 to 10:00 am

Overview

After many years of intensive research and development, speech science and technology have now reached a stage at which practical speech-based solutions are now in use in a number of application areas. Nevertheless, much remains in making the fruits of speech and language to all of the world's peoples. Universal access to and continued multilingual development of language technologies is crucial for social and economic development for many reasons. Language barriers can slow down economic growth significantly, and globalization depends on cross-border and cross-language communication. Fluid and effective multi-lingual speech-based interfaces can eliminate cultural and social barriers, enable access to rare (and potentially beneficial) knowledge to all people regardless of their native language, and foster the preservation of minority languages and the cultures and heritage that they represent.

This talk will discuss some of the key issues and opportunities that remain in extending the benefits of language technologies to the entire world, and research challenges of the unfinished agenda.

Presenter

Raj Reddy is the Mozah Bint Nasser University Professor of Computer Science and Robotics in the School of Computer Science at Carnegie Mellon University He began his academic career as an Assistant Professor at Stanford in 1966. He has been a member of the Carnegie Mellon faculty since 1969. He served as the founding Director of the Robotics Institute from 1979 to 1991 and the Dean of School of Computer Science from 1991 to 1999. Reddy's research interests include the study of human-computer interaction and artificial intelligence. His current research interests include Million Book Digital Library Project; a Multifunction Information Appliance that can be used by the uneducated; Fiber To The Village Project; Mobile Autonomous Robots; and Learning by Doing. Ra j Reddy is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. He was president of the American Association for Artificial Intelligence from 1987 to 89. Dr. Reddy was awarded the Legion of Honor by President Mitterand of France in 1984. He was awarded the ACM Turing Award in 1994, the Okawa Prize in 2004, the Honda Prize in 2005, and the Vannevar Bush Award in 2006. He served as co-chair of the President.s Information Technology Advisory Committee (PITAC) from 1999 to 2001 under Presidents Clinton and Bush.

Plenary Sessions

Plenary 1: ISCA Medalist What a Study of Sound Can Tell Us About Human Speech Perception (and Suggest About Machine Recognition) John J. Ohala, University of California, Berkeley Allegheny Ballrooms, Monday, September 18, 9:00 to 10:00 am

Overview

Presenter

Plenary 2: Creating Speech Interfaces for Mass Market Applications Michael Phillips, Mobeus Corporation Allegheny Ballrooms, Tuesday, September 19, 9:00 to 10:00 am

Overview

Presenter

Plenary 3: Statistical language learning in human infants and adults Elissa L. Newport, University of Rochester Allegheny Ballrooms, Wednesday, September 20, 9:00 to 10:00 am

Overview

Presenter

Plenary 4: Speech Recognition: The Unfinished Agenda Raj Reddy, Carnegie Mellon University Allegheny Ballrooms, Thursday, September 21, 9:00 to 10:00 am

Overview

Presenter

Plenary 1:
ISCA Medalist
What a Study of Sound Can Tell Us About Human Speech Perception
(and Suggest About Machine Recognition)
John J. Ohala, University of California, Berkeley

Allegheny Ballrooms, Monday, September 18, 9:00 to 10:00 am

Plenary 2:
Creating Speech Interfaces for Mass Market Applications
Michael Phillips, Mobeus Corporation

Allegheny Ballrooms, Tuesday, September 19, 9:00 to 10:00 am

Plenary 3:
Statistical language learning in human infants and adults
Elissa L. Newport, University of Rochester

Allegheny Ballrooms, Wednesday, September 20, 9:00 to 10:00 am

Plenary 4:
Speech Recognition: The Unfinished Agenda
Raj Reddy, Carnegie Mellon University

Allegheny Ballrooms, Thursday, September 21, 9:00 to 10:00 am