Recent Advances in Robust Speech Recognition Technology


by

Javier Ramírez, Juan Manuel Górriz

DOI: 10.2174/9781608051724111010
eISBN: 978-1-60805-172-4, 2011
ISBN: 978-1-60805-389-6

  
  




This E-book is a collection of articles that describe advances in speech recognition technology. Robustness in speech recognition refe...[view complete introduction]
PDF US $
- Single user / Non-Library usage: 21
- Multi user / Library usage: 84
Print-On-Demand (P.O.D): *92
Special Offer for Single user / Non-Library usage (PDF + P.O.D): *96

*(Excluding Mailing and Handling)
Purchase: Book Chapters

Table of Contents

Foreword , Pp. i

Alex Acero
Download Free

Preface , Pp. ii

Javier Ramirez and Juan Manuel Gorriz
Download Free

Contributors , Pp. iii-vi (4)

Javier Ramirez and Juan Manuel Gorriz
Download Free

Integration of Statistical-Model-Based Voice Activity Detection and Noise Suppression for Noise Robust Speech Recogni , Pp. 1-12 (12)

Masakiyo Fujimoto
Purchase Chapter

Using GARCH Process for Voice Activity Detection , Pp. 13-29 (17)

Rasool Tahmasbi
Purchase Chapter

Voice Activity Detection Using Contextual Information for Robust Speech Recognition , Pp. 30-45 (16)

J. Ramirez and J. M. Gorriz
Purchase Chapter

Improved Long term Voice Activity Detection for Robust Speech Recognition , Pp. 46-59 (14)

Juan M. Gorriz and Javier Ramirez
Purchase Chapter

Speech Enhancement Algorithms: A Survey , Pp. 60-102 (43)

Philipos C. Loizou
Purchase Chapter

Speech Enhancement and Representation Employing the Independent Component Analysis , Pp. 103-113 (11)

Peter Jancovic, Xin Zou and Munevver Kokuer
Purchase Chapter

Statistical Model based Techniques for Robust Speech Communication , Pp. 114-132 (19)

Nam Soo Kim and Joon-Hyuk Chang
Purchase Chapter

Bayesian Networks and Discrete Observations for Robust Speech Recognition , Pp. 133-140 (8)

Antonio Miguel, Alfonso Ortega and Eduardo Lleida
Purchase Chapter

Robust Large Vocabulary Continuous Speech Recognition Based on Missing Feature Techniques , Pp. 141-154 (14)

Yujun Wang, Maarten Van Segbroeck and Hugo Van hamme
Purchase Chapter

Distribution-Based Feature Compensation for Robust Speech Recognition , Pp. 155-168 (14)

Berlin Chen and Shih-Hsiang Lin
Purchase Chapter

Effective Multiple Regression for Robust Singleand Multi-channel Speech Recognition , Pp. 169-174 (6)

Weifeng Li, Kazuya Takeda and Fumitada Itakura
Purchase Chapter

Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition , Pp. 175-189 (15)

Chang-Wen Hsu and Lin-Shan Lee
Purchase Chapter

Reviewing Feature Non-Linear Transformations for Robust Speech Recognition , Pp. 190-196 (7)

Luz Garcia, Jose Carlos Segura and Angel de la Torre
Purchase Chapter

Advances in Human-Machine Systems for In-Vehicle Environments: Noise and Cognitive Stress/Distraction , Pp. 197-210 (14)

John H.L. Hansen, Pongtep Angkititrakul and Wooil Kim
Purchase Chapter

Index , Pp. 211-214 (4)

Javier Ramirez and Juan Manuel Gorriz
Download Free

Foreword

Speech recognition is becoming part of our everyday lives. Voice dialing in automobiles allows users to use their phones while keeping their eyes on the road and their hands on the steering wheel. Many mobile phones also have voice dialing, and some smartphones even let users dictate text, which can be faster than typing in a small keyboard. The automobile and the mobile phone scenario share in common that there’s often background noise, so it is important that the speech recognizer works well in the presence of background noise. This book presents some interesting statistics on the impact of speech recognition technology when driving.

Modern speech recognition systems are statistical systems trained with many hours of speech samples. A mismatch occurs when the system has to recognize speech that is significantly different than the speech samples used to train the system. This can happen if a child wants to use the system but no children speech was used to train it, or if a person with an accent tries to use the system but only samples from native speakers were used to train the system. It can also happen if the system was trained with noise free speech and it has to be used in a noisy cafeteria. A speech recognition system is called robust when the error rate does not significantly increase when tested in various conditions. This book describes techniques to build speech recognition systems that are robust to background noise.

The user interface in both the automobile and the mobile phone scenarios above often follows the so-called “push-to-talk” method: the user clicks on a button on the steering wheel or on the phone and then speaks. The system then needs to determine when the user is finished, typically when there’s a long enough pause. The problem of detecting the end of speech is not trivial. If the system is looking for too short a silence, perhaps the user is not done speaking and the command gets chopped, resulting in a recognition error. On the other hand, if the system is programmed to “hear” too long a silence, then the perceived system latency increases, plus it increase the chance that speech from another user leaks in. Voice Activity Detection, as it is often called, is a problem in the presence of background noise, especially non-stationary noises such as other speakers. This book devotes the first four chapters to this problem.

To eliminate the mismatch between training and test conditions, the speech samples used to train the speech recognizer often contains a large variation in background noises, with the hope that the noise of the test utterance is similar to the noise encountered during training. While that happens sometimes, it is very hard to cover in training all the possible types of noise conditions. The rest of the book describes techniques that attempt to either denoise the speech signal prior to recognition, or modify the recognizer to be more robust to such noise conditions.

The book edited by Prof. Ramirez and Prof. Gorriz provides a broad overview on the problem of noise robust speech recognition. The chapters, written by experts in their respective field, will make the reader acquainted with a number of topics in this space and provide researchers and practitioners with a set of useful techniques that have been developed in the last few years.

Alex Acero
Microsoft Research
Redmond, WA
USA


Preface

As speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition is becoming increasingly important. Robustness in speech recognition refers to the need to maintain good recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulatory, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources. Although progress over the past decade has been impressive, there are significant obstacles to be overcome before speech recognition systems can reach their full potential. Automatic speech recognition (ASR) systems must be robust to all levels, so that they can handle background or channel noise, the occurrence on unfamiliar words, new accents, new users, or unanticipated inputs. They must exhibit more “intelligence” and integrate speech with other modalities, deriving the user’s intent by combining speech with facial expressions, eye movements, gestures, and other input features, and communicating back to the user through multimedia responses.

The aim of this e-book series is to bring together many different aspects of the current research on robust automatic speech recognition and speech technology. The book is divided into 4 sections: i) voice activity detection, ii) speech enhancement, iii) speech recognition, and iv) emerging applications. Section i consists of 4 papers dealing with model-based techniques, GARCH processes, contextual or long-term information for voice activity detection and noise suppression for robust speech recognition. Section ii consists of three papers including an indepth review of the state-of-the art in speech enhancement, an independent component analysis technique for speech enhancement, and statistical model based techniques for speech enhancement and robust speech recognition. Section iii consists of six chapters devoted to analyzing different techniques including Bayesian networks, missing features, distribution-based feature compensation, multiple regression, to improve the robustness of speech recognition systems in noise environments and for single and multi-channel speech recognition. Section iv consists of a single paper showing advances in human-machine systems for in-vehicle environments.

The E-Book “Recent advances in robust speech recognition technology” is oriented to a wide audience including: i) researchers, professionals and technical experts working in the fields of robust speech recognition, speech enhancement, speech/music detection in noise, ii) the entire signal processing and communications community interested in processing and transmitting speech and music for next generation multimedia applications, and iii) technical experts requiring an understanding of speech/music transmission and recognition in noise over mobile and other networks, as well as postgraduate students working on robust speech/music processing and transmission.

One of the key benefits of this E-Book is that the readers will have access to novel research topics ranging from speech enhancement, robust speech recognition, voice activity detection and its application to demanding scenarios like in-vehicle speech management and robustness. All these topics will be covered in depth and in a more illustrated fashion than in other journals.

We would like to express our gratitude to all the contributing authors that have made this book a reality. We would like to also thank Dr. Acero for writing the foreword and Bentham Science Publishers, particularly Manager Asma Ahmed, for their support and efforts.

Javier Ramírez and Juan Manuel Górriz
University of Granada, Spain

List of Contributors

Editor(s):
Javier Ramírez
University of Granada
Spain


Juan Manuel Górriz
University of Granada
Spain




Contributor(s):
Pongtep Angkititrakul
Toyota Central Research and Development
University of Texas at Dallas
Nagoya
Japan


Joon-Hyuk Chang
School of Electronic Engineering
Inha University
Incheon
Korea


Berlin Chen
National Taiwan Normal University
No. 88, Sec. 4, Ting-Chow Rd., 116
Taipei
Taiwan, R.O.C.


Ángel de la Torre
Dpto. Teoría de la Señal, Telemática y Comunicaciones, Periodista Daniel Saucedo Aranda S/N
University of Granada
Granada , 18071
Spain


Masakiyo Fujimoto
NTT Communication Science Laboratories, NTT Corporation
2-4, Hikari-dai, Seika-cho, Souraku-gun
Kyoto, 619-0237
Japan


Luz García
Dpto. Teoría de la Señal, Telemática y Comunicaciones, Periodista Daniel Saucedo Aranda S/N
University of Granada
Granada, 18071
Spain


Juan Manuel Górriz
Dpto. Teoría de la Señal, Telemática y Comunicaciones, Campus Fuentenueva S/N
University of Granada
Granada , 18071
Spain


John H.L. Hansen
The Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
Richardson
TX, 75080



Chang-Wen Hsu
MediaTek Incorporation
No.1, Dusing Rd. 1, Hsinchu Science Park, Hsin-Chu
Taiwan , 30078
R.O.C.


Fumitada Itakura
Meijo University
1-501 Shiogamaguchi, Tempaku-ku, Nagoya 468-8502
Hangzhou , 310027
Japan


Peter Jancovic
School of Electronic, Electrical and Computer Engineering
University of Birmingham,
Pritchatts Road
Birmingham, B15 2TT
UK


Nam Soo Kim
School of Electrical Engineering and Computer Science
Institute of New Media and Communications, Seoul National University
415 INMC(bldg.132), Seoul National University, San 56-1, Sillim-dong, Gwanak-gu
Seoul
Korea


Wooil Kim
The Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
Richardson
TX, 75080



Munevver Kokuer
School of Electronic, Electrical and Computer Engineering
University of Birmingham
Pritchatts Road
Birmingham, B15 2TT
UK


Lin-shan Lee
College of Electrical Engineering and Computer Science
National Taiwan University
No. 1, Sec. 4 Roosevelt Road, Taipei
Taiwan, 10617
R.O.C


Weifeng Li
Swiss Federal Institute of Technology, Lausanne (EPFL)
EPFL STI IEL LIDIAP, ELD 224 (Bâtiment ELD), Station 11, CH-1015
Lausanne, B15 2TT
Switzerland


Shih-Hsiang Lin
National Taiwan Normal University
No. 88, Sec. 4, Ting-Chow Rd., Taipei 116
Taiwan
R.O.C.


Eduardo Lleida
Communications Technology Group (GTC),
Aragon Institute for Engineering Research (I3A), University of Zaragoza
Edificio Ada Byron, María de Luna 1
Zaragoza España , 50018
Spain


Philipos C. Loizou
Dept. of Electrical Engineering (EC33)
University of Texas-Dallas
800 West Campbell Rd
Richardson
TX , 75080
USA


Antonio Miguel
Communications Technology Group (GTC),
Aragon Institute for Engineering Research (I3A), University of Zaragoza
Edificio Ada Byron, María de Luna 1
Zaragoza España , 50018
Spain


Javier Ramírez
>University of Granada
Periodista Daniel Saucedo Aranda S/N
Granada , 18071
Spain


José Carlos Segura
>University of Granada
Periodista Daniel Saucedo Aranda S/N
Granada , 18071
Spain


Rasool Tahmasbi
Graduate student of Amirkabir University of Technology, Amirkabir University of Technology
(Polytechnic Tehran), 424 Hafez Ave
Tehran
Iran


Kazuya Takeda
Nagoya University
IB Building 8F, Furo-cho, Chikusa-ku
Nagoya, 464-8601
Japan


Alfonso Ortega
Aragon Institute for Engineering Research (I3A), University of Zaragoza
Edificio Ada Byron, María de Luna 1, 50018
Zaragoza España
Spain


Hugo Van hamme
Katholieke Universiteit Leuven
Kasteelpark Arenberg 10 - bus 2441
Leuven, B3001
Belgium


Maarten Van Segbroeck
Katholieke Universiteit Leuven
Kasteelpark Arenberg 10 - bus 2441
Leuven, B3001
Belgium


Yujun Wang
Katholieke Universiteit Leuven
Kasteelpark Arenberg 10 - bus 2441
Leuven, B3001
Belgium


Xin Zou
University of Birmingham,
Pritchatts Road
Birmingham, B15 2TT
UK




Advertisement


Webmaster Contact: urooj@benthamscience.org Copyright © 2014 Bentham Science