Preview

Accoustic Speech

Powerful Essays
Open Document
Open Document
7574 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Accoustic Speech
I44

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-34, NO. 4, AUGUST

Speech Analysis/Synthesis Based on a Sinusoidal Representation
Abstract-A sinusoidal model for the speech waveform used to de- speech compression. The amplitudes and frequencies of is velop a new analysislsynthesis technique that is characterized by the the underlying sine waves are estimated using Kalman filamplitudes,frequencies, andphases of thecomponentsine waves. tering techniques, and each sine-wave phase is defined to These parameters are estimated from the short-time Fourier transform be the integral of the associated instantaneous frequency. using a simple peak-picking algorithm. Rapid changes in the highly Another sine-wave-based speech compression system is resolved spectral components are tracked using the concept“birth” of and “death” of the underlying sine waves. For a given frequency track being developed by Almeida and Silva [4]. In contrast to a cubic function isused to unwrap and interpolate the phase such thatHedelin’s approach, their system uses a pitch estimate to the phase track is m,aximally smooth. This phase function is applied to establisha harmonic set of sinewaves.Thesine-wave a sine-wave generator, which is amplitude modulated and added to the To other sinewaves to give the final speech output. The resulting syntheticphases are computed at the harmonic frequencies. compensate for any errors that might be introduced as a waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in result of the harmonic sine-wave representation, a residthe presence of noise the perceptual characteristics of the speech as ual waveform is codedalong with the underlying sinewell as the noise are maintained. In addition, it was found that the wave parameters. representation was sufficiently general that high-quality reproduction In this paper a sinusoidal model for the speech



References: [ l ] B. S. Atal and J . R. Remde, “A new model of LPC excitation for producingnatural-sounding speech at lowbit rates,” in Proc. Int. Con5Acoust.,Speech, Signal Processing, Paris,France,1982, p. 614. [2] H.Van Trees, Detection, Estimation and Modulation Theory, Part I . New York: Wiley, 1968, ch. 3 . [3] P. Hedelin, “A tone-oriented voice-excited vocoder,” in Proc. Int. Con5 Acoust., Speech,Signal Processing, Atlanta, GA, 1981, p. 205. [4] L. B. Almeida and F. M. Silva, “Variable-frequency synthesis: An improved harmoniccodingscheme,“in Proc. Int. Con$ Acoust., Speech, Signal Processing, San Diego, CA, 1984, p. 27.5.1. [5] R. J. McAulay and T. F. Quatieri, “Magnitude-only reconstruction using a sinusoidalspeech model,” in Proc. Int.Con5 Acoust., Speech, Signal Processing, San Diego, CA, 1984, p. 27.6.1. [6] J . L. Flanagan, “Parametric coding of speech spectra,” J . Acoust. SOC. Arner.,vol. 68, p. 412, 1980. [7] J. L. Flanagan and S. W. Christensen, “Computer studies on parametric coding of speech spectra,” J . Acoust. SOC. Amer., vol. 68, p. 420,1980. [8] T. F. Quatieri and R. J. McAulay, “Speech transformations based on asinusoidalrepresentation,” in Proc.Int. Con$ Acoust.,Speech, Signal Processing, Tampa, FL, 1985, p. 489. [9] R. J. McAulay and T. F. Quatieri, “Mid-rate coding based on a sinusoidal representation of speech,” in Proc. Int. Con$ Acoust., Speech, Signal Processing, Tampa, FL, 1985, p. 945. Thomas F. Quatieri (S’73-M’79)wasbornin Somerville,MA,on January 31,1952.He received the B.S. degree (summa cum laude) from Tufts University, Medford, MA, in 1973 and the S.M., E.E., and Sc.D. degrees from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1975, 1977, and 1979, respectively. From 1973 to 1975 he was a Teaching Assistant and from 1975 to 1979 a Research Assistant in the area of digital signal processing, both within the Department of Electrical Engineering and Computer Scienceof M.I.T. His research for the Masters degree involved the design of two-dimensional digital filters and for the Sc.D. involved phase estimation with application to speech analysis/synthesis. He is presently a Research Staff Member at the M.I.T. Lincoln Laboratory where he is working on problems in digital signal processing with applications to speech communications and image processing. IEEE Dr. Quatieri is therecipient of the 1982PaperAwardofthe Acoustics, Speech, and Signal Processing Society for the best paper by an author under 30 years of age. He is a member of the IEEE Digital Signal Processing Technical Committeeand has served on the steering committee for the 1984 Digital Signal Processing Workshop. He is also a member of Tau Beta Pi, Eta Kappa Nu, and Sigma Xi. Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.

You May Also Find These Documents Helpful

  • Good Essays

    Nt1310 Unit 9 Lab Report

    • 3131 Words
    • 13 Pages

    Speech morphing can be achieved by transforming the signal’s representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.…

    • 3131 Words
    • 13 Pages
    Good Essays
  • Powerful Essays

    netwk 320 week 7 i lab

    • 4646 Words
    • 19 Pages

    A codec is a device capable of performing encoding and decoding on a digital signal. Each codec provides a different level of speech quality. The reason for this is that codecs use different types of compression techniques in order to require less bandwidth. The more the compression, the less bandwidth you will require. However, this will ultimately be at the cost of sound quality, as high-compression/low-bandwidth algorithms will not have the same voice quality as low-compression/high-bandwidth algorithms.…

    • 4646 Words
    • 19 Pages
    Powerful Essays
  • Powerful Essays

    Altera Quartus Experiment

    • 19294 Words
    • 78 Pages

    their digital waveforms were selected and the simulation result was obtained after drawing the input…

    • 19294 Words
    • 78 Pages
    Powerful Essays
  • Good Essays

    Unit 1

    • 928 Words
    • 4 Pages

    of the line. Information is then transferred from digital information, turning it into tones that…

    • 928 Words
    • 4 Pages
    Good Essays
  • Good Essays

    Child obesity Speech

    • 615 Words
    • 2 Pages

    This paper was prepared for COM 120: Principles of Speech Communication, Module 3 Homework assignment Part I, taught by Dr. Cynthia Arellano-lavariere.…

    • 615 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Text to Speech

    • 781 Words
    • 4 Pages

    At present most speech synthesis systems use raw text as their input which is understandable from a human point of view but problematic for the machines since the process of converting text to speech is very complex; in this paper we discuss the need for having a specific SSML tag for each “mention” (1st occurrence, 2nd occurrence) of a proper noun in the text or paragraph. We discuss that when a proper noun appears first time in the text, then it is spoken more prominently than its second or third or subsequent occurrence. We highlight the need for incorporating a specific tag in SSML to take care of this mention-case. The SSML format is a compromise between human and machine needs. SSML is often embedded in Voice-XML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. The advantage that SSML brings is that the designers of such language generation systems need only understand the basic SSML language and do not need specialist speech synthesis knowledge. Introduction Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. SSML directs all Text Analysis steps, providing a standard way to control aspects of speech such as pronunciation, acronym expansion, volume, pitch, rate, range, duration, pause, emphasis, etc., across different synthesis-capable platforms. The intended use of SSML is to improve the quality of synthesized content. Different markup elements impact different stages of the synthesis process. The markup may be produced either automatically, for instance via XSLT or CSS3 from an XHTML document, or by human authoring. Markup may be present within a complete SSML document or as part of a fragment embedded in another language, although no interactions with other languages are specified as…

    • 781 Words
    • 4 Pages
    Good Essays
  • Better Essays

    Pitch Correction Paper

    • 1541 Words
    • 7 Pages

    Bibliography: Daley, D. (2003, October 12). Vocal Fixes. Retrieved December 1, 2011, from Sound on Sound: http://www.soundonsound.com/sos/oct03/articles/vocalfixes.htm…

    • 1541 Words
    • 7 Pages
    Better Essays
  • Satisfactory Essays

    Cited: Schmitz, Andy. " A Primer on Communication Studies ."10.3 Vocal Delivery. N.p., 29 Dec 2012. Web. 4 Nov 2013. .…

    • 319 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Affimative Speech

    • 481 Words
    • 2 Pages

    A BILL TO PROVIDE COMPLETE FUNDING AND ESTABLISH STEM CELL RESEARCH CENTERS IN THE UNITED STATES…

    • 481 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Through most of its history, two-way radio has meant analog voice — the representation of sound waves as either amplitude modulated (AM) or frequency modulated (FM) radio waves. In fact, this is one of the last areas of professional communications to be touched by digital technology. But that’s changing, very quickly, for very good reasons.…

    • 455 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Data Communications

    • 1027 Words
    • 5 Pages

    Given the narrow (usable) audio bandwidth of a telephone transmission facility, a nominal SNR of 56dB (400,000), and a…

    • 1027 Words
    • 5 Pages
    Good Essays
  • Good Essays

    White, G. D. & Louie, G. J. (2005). The Audio Dictionary (3rd ed.) Seattle: University of Washington Press.…

    • 821 Words
    • 4 Pages
    Good Essays
  • Better Essays

    N.S. Jayant, "Digital coding of speech waveforms: PCM, DPCM, and DM quantizers," Proc. IEEE, vol. 62, no. 5, pp. 61 1-632, May 1974.…

    • 1331 Words
    • 6 Pages
    Better Essays
  • Better Essays

    Based on our observations and analysis of various performance parameters, we conclude which of the methods is most suitable for speech enhancement. The implementation of the code is done using Graphic User Interface on MATLAB.…

    • 3824 Words
    • 16 Pages
    Better Essays
  • Good Essays

    What Is Midi ?

    • 522 Words
    • 3 Pages

    • Basic Ideas of compression (see next Chapter) used as integral part of audio format --…

    • 522 Words
    • 3 Pages
    Good Essays