Preview

noise reduction

Powerful Essays
Open Document
Open Document
3029 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
noise reduction
Non-negative Matrix Factorization Based Noise
Reduction for Noise Robust Automatic Speech
Recognition
Seon Man Kim1, Ji Hun Park1, Hong Kook Kim1,*,
Sung Joo Lee2, and Yun Keun Lee2
1
School of Information and Communications
Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
{kobem30002,jh_park,hongkook}@gist.ac.kr
2
Speech/Language Information Research Center
Electronics and Telecommunications Research Institute, Daejeon 305-700, Korea
{lee1862,yklee}@etri.re.kr

Abstract. In this paper, we propose a noise reduction method based on nonnegative matrix factorization (NMF) for noise-robust automatic speech recognition (ASR). Most noise reduction methods applied to ASR front-ends have been developed for suppressing background noise that is assumed to be stationary rather than non-stationary. Instead, the proposed method attenuates non-target noise by a hybrid approach that combines a Wiener filtering and an NMF technique. This is motivated by the fact that Wiener filtering and NMF are suitable for reduction of stationary and non-stationary noise, respectively. It is shown from ASR experiments that an ASR system employing the proposed approach improves the average word error rate by 11.9%, 22.4%, and 5.2%, compared to systems employing the two-stage mel-warped Wiener filter, the minimum mean square error log-spectral amplitude estimator, and NMF with a Wiener postfilter , respectively.
Keywords: Automatic speech recognition (ASR), Non-negative matrix factorization (NMF), Noise reduction, Non-stationary background noise, Wiener filter.

1

Introduction

Most automatic speech recognition (ASR) systems often suffer considerably from unexpected background noise [1]. Thus, many noise-robust methods in the frequency domain have been reported such as spectral subtraction [2], minimum mean square error log-spectral amplitude (MMSE-LSA) estimation [3], and Wiener filtering [4][5].
In general, conventional front-ends employing



References: ASRU, pp. 321–326 (2003) 2 3. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error logspectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985) In: IEEE Workshop on ASRU, pp. 67–70 (1999) 5 801–809 (2010) 6 Nature 401, 788–791 (1999) 7 matrix factorization with priors. In: ICASSP, pp. 4029–4032 (2008) 346 1066–1074 (2007) 10 speech enhancement in nonstationary noise environments. In: ICASSP, pp. 789–792 (1999)

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Basically what silent suppression is a way to save bandwidth when using voice communications like voice IP services which is needed especially for a large company like apple or Microsoft who use these VoIP services to speak with their customers from various countries. If a company were to use a phone service to get in contact with everyone around the world it would not be efficient while lines would be jammed and the service would be horrible. Silent suppression allows intermittent data to be sent through easy especially over the internet and when doing…

    • 538 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Text to Speech Engine

    • 432 Words
    • 2 Pages

    The study process is initialized by going through different web sites and blogs in order to know about the Text-To-Speech methodology. We have tried to understand the purpose of voice synthesis. Whatever we have discovered from the Internet is described below.…

    • 432 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    1. First, I will begin by getting everything I need such as coins, a jar, a hard back book, and paper.…

    • 1097 Words
    • 5 Pages
    Satisfactory Essays
  • Good Essays

    both music conditions and the changing-state speech compared to quiet and steady-state speech conditions. The lack of…

    • 6361 Words
    • 26 Pages
    Good Essays
  • Best Essays

    Light, J., & Lindsay, P. (1992). Message-encoding techniques for augmentative communication systems: The recall performance of adults with severe speech impairments. Journal of Speech and Hearing Research, 35, 853-864.…

    • 4916 Words
    • 20 Pages
    Best Essays
  • Powerful Essays

    Audiology

    • 1156 Words
    • 5 Pages

    "Speech & Hearing Science :: University of Illinois at Urbana-Champaign." Audiology Clinic ::. N.p., n.d. Web. 11 Aug. 2012. <http://shs.illinois.edu/outreach/clinics/audiology.aspx>.…

    • 1156 Words
    • 5 Pages
    Powerful Essays
  • Good Essays

    The voice interfaces – including microphone, speech encoder/decoder, and loudspeaker – are not intended to be included in the toolbox. Instead, to supply the input signal to the channel encoder/interleaver random bits are generated, as Figure 3.3 displays. By comparing this random input sequence with the reconstructed sequence delivered by the channel decoder/de-interleaver block the BER (Bit Error Rate) performance of the system is estimated.…

    • 1154 Words
    • 5 Pages
    Good Essays
  • Better Essays

    (HCW). This novel wavelet can overcome the deficit of CMW in not detecting all the available R peaks and can overcome the deficit…

    • 1588 Words
    • 7 Pages
    Better Essays
  • Powerful Essays

    Dfine2 Userguide

    • 11906 Words
    • 48 Pages

    User Guide © 2008 Josh Haftel Chapter 1 Introduction Introduction to Dfine® 2.0 and the User’s Manual Dfine® 2.0 Chapter 1: Introduction User Guide The result? Dfine 2.0 is a powerful, yet easy-to-use tool that anyone can use to perform high quality noise reduction without needing to understand the complex underlying theory. Because Dfine 2.0 is a plug-in for Adobe Photoshop, Adobe Photoshop Lightroom, Apple Aperture and other compatible applications, you must have Photoshop or a compatible application installed on your computer.…

    • 11906 Words
    • 48 Pages
    Powerful Essays
  • Better Essays

    Non-native English speaker result from the common linguistic phenomenon in which non-native users of any language tend to carry the intonation, phonological processes and pronunciation rules from their mother tongue into their English speech. They may also create innovative pronunciations for English sounds not found in the speaker's first language. Current English speech recognition systems are commonly trained from speech data of native English speakers. Although these systems can work very well for native English speakers, their performances drop dramatically for nonnative speakers. In general, it is difficult to train speech models for each foreign accent due to wide varieties of accent, different proficiency levels of English and limited amounts of available data (MacDonald, 1989).…

    • 1620 Words
    • 7 Pages
    Better Essays
  • Good Essays

    ● Used the Berlin speech emotion database Emo-DB: http://www.expressive-speech.net/emodb/ ● Feature extraction : openSMILE ● Trained multi-class SMO classifer in Weka ○ Accuracy : 82.4%…

    • 1309 Words
    • 6 Pages
    Good Essays
  • Powerful Essays

    Oscillator

    • 7822 Words
    • 32 Pages

    References: 1. Aron Kain, Final Report for Bias Dependence Noise Modeling of Heterojunction Bipolar Transistors, USAF SBIR Phase II (PIIN), F33615-95-C-1707, November 1997. Issued by USAF/AFMC/ASC, Wright Laboratory WL/AAKE BLD 7, 2530 C ST, Wright-Patterson AFB, OH 45433-7607. 2. Robert A. Pucel and Ulrich L. Rohde, "An Accurate Expression for the Noise Resistance Rn of a Bipolar Transistor for Use with the Hawkins Noise Model," IEEE Microwave and Guided Wave Letters, Vol. 3, No. 2, February 1993, pp. 35-37. 3. Robert A. Pucel, W. Struble, Robert Hallgren and Ulrich L. Rohde, "A General Noise Deembedding Procedure for Packaged Two-Port Linear Active Devices," IEEE Transactions on Microwave Theory and Techniques, Vol. 40, No. 11, November 1993, pp. 2013-2024. 4. C. N. Rheinfelder et alia, "47-GHz SiGe MMIC Oscillator," 1999 IEEE MTT-S Digest, pp. 58. 5. V. Rizzoli, F. Mastri, and C. Cecchefti, "Computer-Aided Noise Analysis of MESFET and HEMT Mixers," IEEE Transactions on Microwave Theory and Techniques, Vol. MTT-37, September 1989, pp. 1401-1410. 6. V. Rizzoli and A. Lippadni, "Computer-Aided Noise Analysis of Linear Multiport Networks of Arbitrary Topology," IEEE Transactions on Microwave Theory and Techniques, Vol. MTT33, December 1985, pp. 1507-1512. 7. V. Rizzoli, F. Mastri, and D. Masotti, "General-Purpose Noise Analysis of Forced Nonlinear Microwave Circuits," published in Military Microwave, 1992. 8. Ulrich L. Rohde, "Improved Noise Modeling of GaAs FETs," Microwave Journal, November 1991, pp. 87-101 (Part I) and December 1991, pp. 87-95 (Part II). 9. Ulrich L. Rohde, Chao-Ren Chang, and Jason Gerber, "Design and Optimization of LowNoise Oscillators Using Nonlinear CAD Tools," 1994 IEEE International Frequency Control Symposium, pp. 548-554. 10. Ulrich L. Rohde, "Oscillator Design for Lowest Phase Noise," Microwave Engineering Europe, May 1994, pp. 31-40. 11. Ulrich L. Rohde, Microwave and Wireless Synthesizers: Theory and Design (New York: John Wiley & Sons, 1997, ISBN 0-471-52019-5), Section 5-3 (Low-Noise Microwave Synthesizers) and Appendix B (A General-Purpose Nonlinear Approach to the Computation of Sideband Phase Noise in Free-Running Microwave and RF Oscillators). 12. Ulrich L. Rohde and David P. Newkirk, RF/Microwave Circuit Design for Wireless Applications, by John Wiley & Sons, April 2000, ISBN 0471298182. 13. F. X. Sinnesbichler et alia, "A 50-GHz SiGe HBT Push-Push Oscillator," 1999 IEEE MTT-S Digest, pp. 9-12.…

    • 7822 Words
    • 32 Pages
    Powerful Essays
  • Good Essays

    This signal represents a great challenge for automatic speech applications. Performance of automatic speech recognition (ASR) and automatic speaker identification (ASI) systems , has been shown to degrade significantly in the presence of such signal.…

    • 959 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Accoustic Speech

    • 7574 Words
    • 31 Pages

    Thomas F. Quatieri (S’73-M’79)wasbornin Somerville,MA,on January 31,1952.He received the B.S. degree (summa cum laude) from Tufts University, Medford, MA, in 1973 and the S.M., E.E., and Sc.D. degrees from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1975, 1977, and 1979, respectively. From 1973 to 1975 he was a Teaching Assistant and from 1975 to 1979 a Research Assistant in the area of digital signal processing, both within the Department of Electrical Engineering and Computer Scienceof M.I.T. His research for the Masters degree involved the design of two-dimensional digital filters and for the Sc.D. involved phase estimation with application to speech analysis/synthesis. He is presently a Research Staff Member at the M.I.T. Lincoln Laboratory where he is working on problems in digital signal processing with applications to speech communications and image processing. IEEE Dr. Quatieri is therecipient of the 1982PaperAwardofthe Acoustics, Speech, and Signal Processing Society for the best paper by an author under 30 years of age. He is a member of the IEEE Digital Signal Processing Technical Committeeand has served on the steering committee for the 1984 Digital Signal Processing Workshop. He is also a member of Tau Beta Pi, Eta Kappa Nu, and Sigma Xi.…

    • 7574 Words
    • 31 Pages
    Powerful Essays
  • Satisfactory Essays

    Me and You

    • 268 Words
    • 2 Pages

    TCD will contribute the expertise in the area of speech synthesis and social interaction, as well as providing an ideal environment within which to robustly test the tools developed for multimodal input recognition, multimodal fusion, fission and…

    • 268 Words
    • 2 Pages
    Satisfactory Essays

Related Topics