Sample records for speech

  1. Speech Problems

    MedlinePLUS

    ... a person's ability to speak clearly. Some Common Speech Disorders Stuttering is a problem that interferes with ... form speech sounds into words. Continue What Causes Speech Problems? Normal speech might seem effortless, but it's ...

  2. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  3. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  4. Speech Technologies for Language Learning.

    ERIC Educational Resources Information Center

    Goodwin-Jones, Bob

    2000-01-01

    Provides a description of the following speech technologies for language learning: recorded speech (from analog to digital); speech recognition; speech synthesis; multilingual speech to speech; speech on the Web. A resource list is also provided. (Author/VWL)

  5. Speech production knowledge in automatic speech recognition 

    E-print Network

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-01-01

    in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone...

  6. Does Freedom of Speech Include Hate Speech?

    Microsoft Academic Search

    Caleb Yong

    I take it that liberal justice recognises special protections against the restriction of speech and expression; this is what\\u000a I call the Free Speech Principle. I ask if this Principle includes speech acts which might broadly be termed ‘hate speech’,\\u000a where ‘includes’ is sensitive to the distinction between coverage and protection, and between speech that is regulable and speech that

  7. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  8. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  9. Speech analyzer

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C. (inventor)

    1977-01-01

    A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

  10. Keynote Speeches.

    ERIC Educational Resources Information Center

    2000

    This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities of Hong…

  11. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  12. SPEECH ENHANCEMENT FOR NOISE-ROBUST SPEECH RECOGNITION

    E-print Network

    Shapiro, Benjamin

    SPEECH ENHANCEMENT FOR NOISE-ROBUST SPEECH RECOGNITION Vikramjit Mitra and Carol Y. Espy-robust Speech Recognition Speech Enhancement Noise robust automated transcription Corpus `Speech in Speech Recognition accuracy for speech corrupted by speech shaped noise Conclusion ·Proposed scheme increases CLD

  13. SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS

    E-print Network

    SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gaji of automatic speech recognition systems (ASR) against additive background noise, by finding speech parameters noises. 1. INTRODUCTION State-of-the-art automatic speech recognition (ASR) systems are capable

  14. Speech and Language Disorders

    MedlinePLUS

    ... and Disabilities This information in Spanish ( en español ) Speech and language disorders More information on speech and ... for you. Return to top More information on Speech and language disorders Explore other publications and websites ...

  15. Speech impairment (adult)

    MedlinePLUS

    Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... Common speech and language disorders include: APHASIA Aphasia is loss of the ability to understand or express spoken or ...

  16. Speech Recognition in Machines 785 Speech Recognition in Machines

    E-print Network

    Liebling, Michael

    Speech Recognition in Machines 785 Speech Recognition in Machines Over the past several decades (speech recognition systems) human speech. We concentrate on speech recognition systems in this section. Speech recognition by machine refers to the capability of a machine to convert human speech to a textual

  17. Speech probability distribution

    Microsoft Academic Search

    Saeed Gazor; Wei Zhang

    2003-01-01

    It is demonstrated that the distribution of speech samples is well described by Laplacian distribution (LD). The widely known speech distributions, i.e., LD, Gaussian distribution (GD), generalized GD, and gamma distribution, are tested as four hypotheses, and it is proved that speech samples during voice activity intervals are Laplacian random variables. A decorrelation transformation is then applied to speech samples

  18. Delayed Speech or Language Development

    MedlinePLUS

    ... if your child is right on schedule. Normal Speech & Language Development It's important to discuss early speech ... versus little, for example). Continue The Difference Between Speech and Language Speech and language are often confused, ...

  19. Going to a Speech Therapist

    MedlinePLUS

    ... the Body Works Main Page Going to a Speech Therapist KidsHealth > Kids > People, Places & Things That Help > ... therapists (also called speech-language pathologists ). What Do Speech Therapists Help With? Speech therapists help people of ...

  20. ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION

    E-print Network

    Wang, DeLiang "Leon"

    ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION Arun Narayanan speech recognition to enhance noisy speech. Typically, a single prior model is trained by pooling to reconstruct noisy speech. Significant improvements are obtained on the Aurora-4 robust speech recognition task

  1. Speech and Communication Disorders

    MedlinePLUS

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  2. Context dependent speech recognition 

    E-print Network

    Andersson, Sebastian

    2006-01-01

    Poor speech recognition is a problem when developing spoken dialogue systems, but several studies has showed that speech recognition can be improved by post-processing of recognition output that use the dialogue context, ...

  3. Trainable videorealistic speech animation

    E-print Network

    Ezzat, Tony F. (Tony Farid)

    2002-01-01

    I describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After ...

  4. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  5. Speech and Language Delay

    MedlinePLUS

    MENU Return to Web version Speech and Language Delay Overview How do I know if my child has speech delay? Every child develops at his or her ... of the same age, the problem may be speech delay. Your doctor may think your child has ...

  6. A Speech Fundamentals Odyssey.

    ERIC Educational Resources Information Center

    Drake, H. L.

    Presenting an historical overview of the development of speech courses at Millersville University (Pennsylvania), this paper analyzes the past, present, and future status of speech communication within the context of the general education curriculum. The historical perspective treats the origins and definition of speech as a college course…

  7. Speech Production Results: Speech Feature Acquisition.

    ERIC Educational Resources Information Center

    Tobey, Emily; And Others

    1994-01-01

    This paper describes changes in speech production of deaf children with either cochlear implants (CI), tactile aids, or hearing aids. More accurate production of speech sounds was found for the CI group. The CI group was more accurate on less-visible place features, complex manner features such as glides and fricatives, and some voicing features.…

  8. Speech Intelligibility After Glossectomy and Speech Rehabilitation

    Microsoft Academic Search

    Cristina L. B. Furia; Luiz P. Kowalski; Maria R. D. O. Latorre; Elisabete C. Angelis; Nivia M. S. Martins; Ana P. B. Barros; Karina C. B. Ribeiro

    2001-01-01

    Background: Oral tumor resections cause articulation deficiencies, depending on the site, extent of resection, type of reconstruction, and tongue stump mobility. Objectives: To evaluate the speech intelligibility of pa- tients undergoing total, subtotal, or partial glossec- tomy, before and after speech therapy. Patients and Methods: Twenty-seven patients (24 men and 3 women), aged 34 to 77 years (mean age, 56.5

  9. Machine Translation from Speech

    NASA Astrophysics Data System (ADS)

    Schwartz, Richard; Olive, Joseph; McCary, John; Christianson, Caitlin

    This chapter describes approaches for translation from speech. Translation from speech presents two new issues. First, of course, we must recognize the speech in the source language. Although speech recognition has improved considerably over the last three decades, it is still far from being a solved problem. In the best of conditions, when the speech comes from high quality, carefully enunciated speech, on common topics (such as speech read by a trained news broadcaster), the word error rate is typically on the order of 5%. Humans can typically transcribe speech like this with less than 1% disagreement between annotators, so even this best number is still far worse than human performance. However, the task gets much harder when anything changes from this ideal condition. Some of the conditions that cause higher error rate are, if the topic is somewhat unusual, or the speakers are not reading so that their speech is more spontaneous, or if the speakers have an accent or are speaking a dialect, or if there is any acoustic degradation, such as noise or reverberation. In these cases, the word error can increase significantly to 20%, 30%, or higher. Accordingly, most of this chapter discusses techniques for improving speech recognition accuracy, while one section discusses techniques for integrating speech recognition with translation.

  10. EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE

    E-print Network

    EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE B.T. Lilly and K.K. Paliwal School as input to a recognition system. In this paper, the results of a study to examine the effects speech/s to 40 kbits/s are used with two different speech recognition systems 1) isolated word recogntion and 2

  11. Speech Compression by Polynomial Approximation

    Microsoft Academic Search

    Sorin Dusan; James L. Flanagan; Amod Karve; Mridul Balaraman

    2007-01-01

    Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. These objectives are antipodal in nature since higher compression presupposes preserving less information about the original speech signal. This paper presents a method for compressing speech based on polynomial approximations of the trajectories in time of various speech features (i.e., spectrum,

  12. Hearing or speech impairment - resources

    MedlinePLUS

    Resources - hearing or speech impairment ... The following organizations are good resources for information on hearing impairment or speech impairment: American Speech-Language-Hearing Association - www.asha.org National Dissemination Center for Children ...

  13. Speech input and output

    NASA Astrophysics Data System (ADS)

    Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

    1981-12-01

    Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

  14. Speech Sound Disorders: Articulation and Phonological Processes

    MedlinePLUS

    Speech Sound Disorders: Articulation and Phonological Processes What are speech sound disorders ? Can adults have speech sound ... with individuals with speech sound disorders ? What are speech sound disorders? Most children make some mistakes as ...

  15. Speech-Language Therapy (For Parents)

    MedlinePLUS

    ... most kids with speech and/or language disorders. Speech Disorders and Language Disorders A speech disorder refers ... in a socially appropriate way. Continue Specialists in Speech-Language Therapy Speech-language pathologists (SLPs), often informally ...

  16. The Discipline of Speech.

    ERIC Educational Resources Information Center

    Reid, Loren

    1967-01-01

    In spite of the diversity of subjects subsumed under the generic term speech, all areas of this discipline are based on oral communication with its essential elements--voice, action, thought, and language. Speech may be viewed as a community of persons with a common tradition participating in a common dialog, described in part by the memberships…

  17. STUDENTS AND FREE SPEECH

    NSDL National Science Digital Library

    dramsden

    2013-04-22

    Free speech is a constitutional right, correct? What about in school? The US Constitution protects everyone, young or old big or small. As Horton said "A person is a person no matter how small". Yet does that mean people can say what ever they want, whenever they want? Does the right to free speech give ...

  18. Illustrated Speech Anatomy.

    ERIC Educational Resources Information Center

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  19. Improving Alaryngeal Speech Intelligibility.

    ERIC Educational Resources Information Center

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

  20. Chief Seattle's Speech Revisited

    ERIC Educational Resources Information Center

    Krupat, Arnold

    2011-01-01

    Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but apparently inevitable…

  1. .ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT

    E-print Network

    .ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT B. T. Lilly Brisbane, QLD 4111, Australia B.Ldy, K.Paliwal@me.gu.edu.au ABSTRACT Speech recognition systems work as a preprocessor for recognising speech in the presence of noise. It was found to improve the recognition

  2. THE EFFECT OF SPEECH AND AUDIO COMPRESSION ON SPEECH RECOGNITION

    E-print Network

    Boyer, Edmond

    THE EFFECT OF SPEECH AND AUDIO COMPRESSION ON SPEECH RECOGNITION PERFORMANCE L. Besacier, C on the performance of our continuous speech recognition engine. GSM full rate, G711, G723.1 and MPEG coders are investigated. It is shown that MPEG transcoding degrades the speech recognition performance for low bitrates

  3. Robust speech recognition by integrating speech separation and hypothesis testing

    E-print Network

    Wang, DeLiang "Leon"

    Robust speech recognition by integrating speech separation and hypothesis testing Soundararajan estimation and recognition accuracy. First, an n-best lattice consistent with a speech separation mask significant improvement in recognition performance compared to that using speech separation alone. Ó 2009

  4. On speech recognition during anaesthesia

    E-print Network

    On speech recognition during anaesthesia Alexandre Alapetite NOVEMBER 2007 ROSKILDE UNIVERSITY speech recognition during anaesthesia Alexandre Alapetite #12;Revision 2007-10-30 #12;On speech Research Training Network #12;#12;Alapetite 2007: On speech recognition during anaesthesia. PhD thesis. 5

  5. Speech Acts and Conversational Interaction.

    ERIC Educational Resources Information Center

    Geis, Michael L.

    This book unites speech act theory and conversation analysis to advance a theory of conversational competence, called the Dynamic Speech Act Theory (DSAT). In contrast to traditional speech act theory that focuses almost exclusively on intuitive assessments of isolated, constructed examples, this theory is predicated on the assumption that speech

  6. Mandarin emotion recognition in speech

    Microsoft Academic Search

    Tsang-Long Pao; Yu-Te Chen

    2003-01-01

    Humans interact with others in several ways, such as speech, gesture, eye contact etc. Among them, speech is the most effective way of communication through which people can readily exchange information without the need for any other tool. Emotions color the speech, and can make the meaning more complex and tell about how it is said. A Mandarin speech based

  7. Aging and Speech Understanding

    PubMed Central

    2015-01-01

    As people age, structural as well as neural degeneration occurs throughout the auditory system. Many older adults experience difficulty in understanding speech especially in adverse listening conditions although they can hear speech sounds. According to a report of the Committee on Hearing and Bioacoustics and Biomechanics of the National Research Council, peripheral, central-auditory, and cognitive systems have long been considered major factors affecting the understanding of speech. The present study aims to review 1) age-related changes in the peripheral, central-auditory, and cognitive systems, 2) the resulting decline in the understanding of speech, and 3) the clinical implication for audiologic rehabilitation of older adults. Once the factors affecting the understanding of speech in older adults are identified and the characteristics of age-related speech understanding difficulties are examined, clinical management could be developed for prevention and treatment. Future research about problems related to the understanding of speech in older adults will help to improve the quality of life in the elderly. PMID:26185785

  8. Robust Speech Recognition Under Noisy Ambient

    E-print Network

    CHAPTER Robust Speech Recognition Under Noisy Ambient Conditions 6Kuldip K. Paliwal School ............................................................................................................. 136 6.2 Speech Recognition Overview ............................................................................... 141 6.4 Robust Speech Recognition Techniques

  9. Portable Speech Synthesizer

    NASA Technical Reports Server (NTRS)

    Leibfritz, Gilbert H.; Larson, Howard K.

    1987-01-01

    Compact speech synthesizer useful traveling companion to speech-handicapped. User simply enters statement on board, and synthesizer converts statement into spoken words. Battery-powered and housed in briefcase, easily carried on trips. Unit used on telephones and face-to-face communication. Synthesizer consists of micro-computer with memory-expansion module, speech-synthesizer circuit, batteries, recharger, dc-to-dc converter, and telephone amplifier. Components, commercially available, fit neatly in 17-by 13-by 5-in. briefcase. Weighs about 20 lb (9 kg) and operates and recharges from ac receptable.

  10. Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter

    Microsoft Academic Search

    Masataka Goto; Katunobu Itou; Koji Kitayama; Tetsunori Kobayashi

    2004-01-01

    This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Al- though various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musi- cal pieces have not been studied. We propose two differ- ent speech-recognition interfaces for MIR, speech com- pletion and speech spotter, and describe two MIR-based hands-free jukebox systems that enable a user to

  11. Adapted packet speech interpolation

    NASA Astrophysics Data System (ADS)

    Lacovara, R. C.; Vaman, D. R.

    1991-04-01

    Adapted packet speech interpolation (APSI) is presented as an evolution of digital speech interpolation (DSI) techniques. The inherent overload penalties of DSI are mitigated by the use of an overload strategy which distributes the penalties uniformly across all active speech sources. A novel use of linear delta modulation (LDM) allows the system to re-encode the input sources at various rates depending upon the total offered load to the system. The subjective performance of hardware is discussed. Two models of silence and talk-spurt behavior (called activity) of speech are presented: an analytic model for single speakers obtained by the application of renewal theory, and a simulation model obtained from the analytic model.

  12. Anxiety and ritualized speech

    ERIC Educational Resources Information Center

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  13. Improving statistical speech recognition 

    E-print Network

    Renals, Steve; Morgan, Nelson; Cohen, Michael; Franco, Horacio; Bourlard, Herve

    A summary of the theory of the hybrid connectionist HMM (hidden Markov model) continuous speech recognition system is presented. Experimental results indicating that the connectionist methods can significantly improve the performance of a context...

  14. Speech Quality Assessment

    Microsoft Academic Search

    Philipos C. Loizou

    \\u000a This chapter provides an overview of the various methods and techniques used for assessment of speech quality. A summary is\\u000a given of some of the most commonly used listening tests designed to obtain reliable ratings of the quality of processed speech\\u000a from human listeners. Considerations for conducting successful subjective listening tests are given along with cautions that\\u000a need to be

  15. [Dysphagia and speech disorders].

    PubMed

    Pruszewicz, A; Wo?nica, B; Obrebowski, A; Karlik, M

    1999-01-01

    Swallowing disorders in oral and pharyngeal phase after surgery of mouth, pharynx or larynx are very often interrelated with speech and voice disorders. The results of diagnostic methods of dysphagia and voice/speech disorders based on own material of patients after total laryngectomy, partial tongue resection and cleft palate surgery were presented. Attention was also paid to other etiological factors of swallowing disorders observed in phoniatric practice. PMID:10391042

  16. Speech processing using maximum likelihood continuity mapping

    DOEpatents

    Hogden, John E. (Santa Fe, NM)

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  17. Speech processing using maximum likelihood continuity mapping

    SciTech Connect

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  18. Cochlear implant speech recognition with speech maskersa) Ginger S. Stickneyb)

    E-print Network

    Litovsky, Ruth

    Cochlear implant speech recognition with speech maskersa) Ginger S. Stickneyb) and Fan-Gang Zeng; accepted 16 May 2004 Speech recognition performance was measured in normal-hearing and cochlear-implant processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli

  19. Audio-Visual Speech Modeling for Continuous Speech Recognition

    Microsoft Academic Search

    Stéphane Dupont; Juergen Luettin

    2000-01-01

    This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is

  20. Open Domain Speech Recognition & Translation:Lectures and Speeches

    Microsoft Academic Search

    C. Fugen; M. Kolss; D. Bernreuther; M. Paulik; S. Stuker; S. Vogel; A. Waibel

    2006-01-01

    For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory

  1. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    ERIC Educational Resources Information Center

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  2. A virtual vocabulary speech recognizer

    E-print Network

    Pathe, Peter D

    1983-01-01

    A system for the automatic recognition of human speech is described. A commercially available speech recognizer sees its recognition vocabulary increased through the use of virtual memory management techniques. central to ...

  3. Why Go to Speech Therapy?

    MedlinePLUS

    Tweet Why Go To Speech Therapy? Parents of Preschoolers Parents of School-Age Children Just for Kids Teens Adults Teachers Speech- ... types of therapy work best when you can go on an intensive schedule (i.e., every day ...

  4. SPEECH-LANGUAGE-HEARING CLINIC

    E-print Network

    Veiga, Pedro Manuel Barbosa

    's faculty, offer assessment and therapy services for a variety of speech, language and hearing disordersSPEECH-LANGUAGE- HEARING CLINIC AT OSU-TULSA The OSU-Tulsa Speech-Language-Hearing Clinic provides and phonology · Voice · Hearing loss · Receptive and expressive language · Resonance · Aphasia · Reading

  5. "Zero Tolerance" for Free Speech.

    ERIC Educational Resources Information Center

    Hils, Lynda

    2001-01-01

    Argues that school policies of "zero tolerance" of threatening speech may violate a student's First Amendment right to freedom of expression if speech is less than a "true threat." Suggests a two-step analysis to determine if student speech is a "true threat." (PKP)

  6. Signed Soliloquy: Visible Private Speech

    ERIC Educational Resources Information Center

    Zimmermann, Kathrin; Brugger, Peter

    2013-01-01

    Talking to oneself can be silent (inner speech) or vocalized for others to hear (private speech, or soliloquy). We investigated these two types of self-communication in 28 deaf signers and 28 hearing adults. With a questionnaire specifically developed for this study, we established the visible analog of vocalized private speech in deaf signers.…

  7. A Guide for Speech Therapy.

    ERIC Educational Resources Information Center

    Stowell, L. James, Ed.; And Others

    The handbook is designed as a guide to the school speech therapy programs within the Cooperative Educational Service Agency 5 in Wisconsin. A general philosophy of speech therapy is presented, the professional responsibilities of the speech clinician outlined, and professional associations described. The responsibilities of the administration to…

  8. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  9. Media Criticism Group Speech

    ERIC Educational Resources Information Center

    Ramsey, E. Michele

    2004-01-01

    Objective: To integrate speaking practice with rhetorical theory. Type of speech: Persuasive. Point value: 100 points (i.e., 30 points based on peer evaluations, 30 points based on individual performance, 40 points based on the group presentation), which is 25% of course grade. Requirements: (a) References: 7-10; (b) Length: 20-30 minutes; (c)…

  10. Disfluencies in Learner Speech.

    ERIC Educational Resources Information Center

    Temple, Liz

    1992-01-01

    Disfluent phenomena such as pauses, hesitations, and repairs are investigated in 42 short samples of spontaneous speech of native French speakers and learners of French. It is found that native speakers attend to the construction of the referent, whereas learners are more concerned with syntactic construction. (Contains 14 references.) (Author/LB)

  11. Speech Communication Systems

    Microsoft Academic Search

    WINSTON E. KOCKt

    1962-01-01

    Telephone, radio broadcasting, public-address and bandwidth conserving systems are discussed with particular attention being given to the latter two. The relation of certain properties of hearing (e.g., the Haas effect) to public address system design is reviewed along with several bandwidth conserving techniques including speech interpolation systems and vocoders.

  12. Statistical Parametric Speech Synthesis

    E-print Network

    Cortes, Corinna

    synthesis All segments Target cost Concatenation cost · Concatenate actual instances of speech from database analysis Text analysis Model training x y x y^ l^ · Training - Extract linguistic features x & acoustic features y - Train acoustic model given (x, y) ^ = arg max p(y | x, ) Heiga Zen Statistical Parametric

  13. Microprocessor for speech recognition

    SciTech Connect

    Ishizuka, H.; Watari, M.; Sakoe, H.; Chiba, S.; Iwata, T.; Matsuki, T.; Kawakami, Y.

    1983-01-01

    A new single-chip microprocessor for speech recognition has been developed utilizing multi-processor architecture and pipelined structure. By DP-matching algorithm, the processor recognizes up to 340 isolated words or 40 connected words in realtime. 6 references.

  14. Perceptual Learning in Speech

    ERIC Educational Resources Information Center

    Norris, Dennis; McQueen, James M.; Cutler, Anne

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g.,…

  15. Black History Speech

    ERIC Educational Resources Information Center

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  16. GLOBAL FREEDOM OF SPEECH

    Microsoft Academic Search

    Lars Binderup

    2007-01-01

    It has been suggested that the multicultural nature of modern liberal states (in particular the formation of immigration minorities from other cultures due to the process of globalisation) provides reasons - from a liberal egalitarian perspective - for recognising a civic or democratic norm, as opposed to a legal norm, that curbs exercises of the right to free speech that

  17. Figures of Speech

    ERIC Educational Resources Information Center

    Dutton, Yanina; Meyer, Sue

    2007-01-01

    In this article, the authors report that almost one in three adults in the UK have experience of learning a language as an adult, but only four percent are currently doing so--one percent less that in 1999, equivalent to a drop of half a million adults learning languages. Figures of speech, NIACE's UK-wide survey of language learning, also found a…

  18. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  19. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    ERIC Educational Resources Information Center

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  20. TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.

    ERIC Educational Resources Information Center

    JENKINSON, EDWARD B., ED.

    THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…

  1. NOISE ADAPTIVE SPEECH RECOGNITION WITH ACOUSTIC MODELS TRAINED FROM NOISY SPEECH EVALUATED ON AURORA-2 DATABASE

    E-print Network

    NOISE ADAPTIVE SPEECH RECOGNITION WITH ACOUSTIC MODELS TRAINED FROM NOISY SPEECH EVALUATED the noise adaptive speech recognition for noisy speech recognition in non-stationary noise to the situation that acoustic models are trained from noisy speech. We justify it by that the noise adaptive speech recognition

  2. Neurophysiology of speech differences in childhood apraxia of speech.

    PubMed

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

  3. Neurophysiology of Speech Differences in Childhood Apraxia of Speech

    PubMed Central

    Preston, Jonathan L.; Molfese, Peter J.; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

  4. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  5. SpeechBot

    NSDL National Science Digital Library

    This new experimental search engine from Compaq indexes over 2,500 hours of content from 20 popular American radio shows. Using its speech recognition software, Compaq creates "a time-aligned 'transcript' of the program and build[s] an index of the words spoken during the program." Users can then search the index by keyword or advanced search. Search returns include the text of the clip, a link to a longer transcript, the relevant audio clip in RealPlayer format, the entire program in RealPlayer format, and a link to the radio show's Website. The index is updated daily. Please note that, while SpeechBot worked fine on Windows/NT machines, the Scout Project was unable to access the audio clips using Macs.

  6. Intelligibility enhancement of synthetic speech in noise 

    E-print Network

    Valentini Botinha?o, Ca?ssia

    2013-11-28

    Speech technology can facilitate human-machine interaction and create new communication interfaces. Text-To-Speech (TTS) systems provide speech output for dialogue, notification and reading applications as well as personalized ...

  7. Large Vocabulary, Multilingual Speech Recognition: Session Overview

    E-print Network

    Large Vocabulary, Multilingual Speech Recognition: Session Overview Lori LAMEL, Yoshinori SAGISAKA developed for a given language provide cruical input to speech recognition technology world-wide. However associate knowledge on speaker-independent, large vocabulary, continuous speech recognition technology among

  8. Speech Recognition: How Do We Teach It?

    ERIC Educational Resources Information Center

    Barksdale, Karl

    2002-01-01

    States that growing use of speech recognition software has made voice writing an essential computer skill. Describes how to present the topic, develop basic speech recognition skills, and teach speech recognition outlining, writing, proofreading, and editing. (Contains 14 references.) (SK)

  9. Crowdsourcing Correction of Speech Recognition Captioning Errors

    E-print Network

    Southampton, University of

    Crowdsourcing Correction of Speech Recognition Captioning Errors M Wald, University of Southampton crowdsourcing correction of speech recognition captioning errors to provide a sustainable method of making using speech recognition technologies but this results in many recognition errors requiring manual

  10. Ubiquitous speech communication interface

    Microsoft Academic Search

    B. H. Juang

    2001-01-01

    The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode in which

  11. Evaluation of NASA speech encoder

    NASA Technical Reports Server (NTRS)

    1976-01-01

    Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.

  12. Speech Recognition Using Neural Networks

    Microsoft Academic Search

    Stephen V. Kosonockyl

    \\u000a The field of artificial neural networks has grown rapidly in recent years. This has been accompanied by an insurgence of work\\u000a in speech recognition. Most speech recognition research has centered on stochastic models, in particular the use of hidden\\u000a Markov models (HMMs) [9][28][29][30][45][47]. Alternate techniques have focused on applying neural networks to classify speech signals [6][11][48]. The inspiration for using

  13. Statistical models for noise-robust speech recognition

    E-print Network

    van Dalen, Rogier Christiaan

    2011-11-08

    A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model...

  14. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  15. CAN AUTOMATIC SPEECH RECOGNITION LEARN MORE FROM HUMAN SPEECH PERCEPTION?

    Microsoft Academic Search

    Sorin DUSAN; Lawrence R. RABINER

    1 Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read and articulated speech in quiet conditions. This performance gap (between machines

  16. A speech locked loop for cochlear implants and speech prostheses

    E-print Network

    Wee, Keng Hoong

    We have previously described a feedback loop that combines an auditory processor with a low-power analog integrated-circuit vocal tract to create a speech-locked-loop. Here, we describe how the speech-locked loop can help ...

  17. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    ERIC Educational Resources Information Center

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  18. Speech-in-Speech Recognition: A Training Study

    ERIC Educational Resources Information Center

    Van Engen, Kristin J.

    2012-01-01

    This study aims to identify aspects of speech-in-noise recognition that are susceptible to training, focusing on whether listeners can learn to adapt to target talkers ("tune in") and learn to better cope with various maskers ("tune out") after short-term training. Listeners received training on English sentence recognition in speech-shaped noise…

  19. Speech separation using speaker-adapted eigenvoice speech models

    E-print Network

    Ellis, Dan

    Speech separation using speaker-adapted eigenvoice speech models Ron J. Weiss, Daniel P.W. Ellis for significantly improved separation performance over that obtained using unadapted source models. The algorithm. All rights reserved. Keywords: Source separation; Model adaptation; Eigenvoice 1. Introduction

  20. Creating Speech-Synchronized Animation

    Microsoft Academic Search

    Scott A. King; Richard E. Parent

    2005-01-01

    We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial

  1. Maria Montessori on speech education

    Microsoft Academic Search

    David A. Stern

    1973-01-01

    Maria Montessori's theory of education is examined with emphasis upon its implications for the speech?communication development of children. Detailed in the article are: (1) Mon?tessori's theory of speech and language acquisition, (2) her pedagogical procedure for teaching spoken vocabulary, and (3) the educational environment she suggests which supports children's free interaction and facilitates their development into willing, confident communicators.

  2. Values Clarification and Speech Communication.

    ERIC Educational Resources Information Center

    Gurry, Joanne

    In the speech communication classroom, values clarification activities can be used as motivational techniques and as methods for teaching interpersonal communication skills. Learning to use communication skills can be a values-clarifying process in itself and can occur in speech areas viewed as primarily cognitive: argumentation, persuasion,…

  3. Speech Communication and Multimodal Interfaces

    Microsoft Academic Search

    Björn Schuller; Markus Ablaßmeier; Ronald Müller; Stefan Reifinger; Tony Poitschke; Gerhard Rigoll

    Within the area of advanced man-machine interaction, speech communication has always played a major role for several decades. The idea of replacing the con- vential input devices such as buttons and keyboard by voice control and thus increas- ing the comfort and the input speed considerably, seems that much attractive, that even the quite slow progress of speech technology during

  4. Deictic Reference in Children's Speech.

    ERIC Educational Resources Information Center

    Keller-Cohen, Deborah

    The purpose of this paper is to examine the status of deictic reference in the speech of 19 three-year-old Black children. The deictic verbs of motion are examined with reference to other aspects of the deictic system. The data for this study are approximately eight hours of spontaneous speech collected in a pre-school classroom. The hypothesis to…

  5. Perceptual Aspects of Cluttered Speech

    ERIC Educational Resources Information Center

    St. Louis, Kenneth O.; Myers, Florence L.; Faragasso, Kristine; Townsend, Paula S.; Gallaher, Amanda J.

    2004-01-01

    The purpose of this descriptive investigation was to explore perceptual judgments of speech naturalness, compared to judgments of articulation, language, disfluency, and speaking rate, in the speech of two youths who differed in cluttering severity. Two groups of listeners, 48 from New York and 48 from West Virginia, judged 93 speaking samples on…

  6. Speech Restoration: An Interactive Process

    ERIC Educational Resources Information Center

    Grataloup, Claire; Hoen, Michael; Veuillet, Evelyne; Collet, Lionel; Pellegrino, Francois; Meunier, Fanny

    2009-01-01

    Purpose: This study investigates the ability to understand degraded speech signals and explores the correlation between this capacity and the functional characteristics of the peripheral auditory system. Method: The authors evaluated the capability of 50 normal-hearing native French speakers to restore time-reversed speech. The task required them…

  7. Modeling human activities as speech

    Microsoft Academic Search

    Chia-Chih Chen; J. K. Aggarwal

    2011-01-01

    Human activity recognition and speech recognition appear to be two loosely related research areas. However, on a careful thought, there are several analogies between activity and speech signals with regard to the way they are generated, propagated, and perceived. In this paper, we propose a novel action representation, the action spectrogram, which is inspired by a common spectrographic representation of

  8. Audiovisual Speech Recalibration in Children

    ERIC Educational Resources Information Center

    van Linden, Sabine; Vroomen, Jean

    2008-01-01

    In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…

  9. A Newly Devised Speech Accumulator

    Microsoft Academic Search

    Seiichi Ryu; Sohtaro Komiyama; Shuichiro Kannae; Hiroshi Watanabe

    1983-01-01

    Voice therapy is often most effective for treating patients with vocal cord polyp, polypoid degeneration and singer’s nodule. However, little is known about the total speaking times in 1 day, the ratio of speech per hour and the sound level during speech, in individual patients. If these parameters can be readily detected, it could be clarified as to how speaking

  10. Creating speech-synchronized animation.

    PubMed

    King, Scott A; Parent, Richard E

    2005-01-01

    We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements. PMID:15868833

  11. Detecting Emotions in Mandarin Speech

    Microsoft Academic Search

    Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh; Wen-Yuan Liao

    2005-01-01

    The importance of automatically recognizing emotions in human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. In this paper, a Mandarin speech based emotion classification method is presented. Five primary human emotions, including anger, boredom, happiness, neutral and sadness, are investigated. Combining different feature streams to obtain a more accurate result is a

  12. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    ERIC Educational Resources Information Center

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  13. AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION

    E-print Network

    Alwan, Abeer

    ×× AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION Qifeng, and its application to automatic speech recognition (ASR) is studied. Speech production can be regarded or pitch. For example, the VTTF is often used in feature extraction for Automatic Speech Recognition (ASR

  14. An open source speech synthesis module for a visual-speech recognition system

    E-print Network

    Paris-Sud XI, Université de

    An open source speech synthesis module for a visual-speech recognition system S. Manitsarisa , B technology that permits speech communication without vocalization. The visual-speech recognition engine the opportunity to speak with his/her original voice. The visual- speech recognition engine of the SSI outputs

  15. SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION

    E-print Network

    SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION J. Chen* , K. K the performance of speech recognition systems. For many speech recognition applications the most important source of acoustical distortion is the additive noise. Much research effort in robust speech recognition has been

  16. Undergraduate Student SPEECH-LANGUAGE PATHOLOGY

    E-print Network

    Long, Nicholas

    Undergraduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY and Disorders Program #12;2 Contents SECTION PAGE 1.0 Speech-Language Pathology Program 4 1.1 History 4 1-Language-Hearing Association 27 #12;4 1.0 SFASU SPEECH-LANGUAGE PATHOLOGY AND AUDIOLOGY PROGRAM 1.1 History The Speech

  17. ANNUAL SPEECH PATHOLOGY HONOURS RESEARCH MINICONFERENCE 2012

    E-print Network

    ANNUAL SPEECH PATHOLOGY HONOURS RESEARCH MINICONFERENCE 2012 Every year the Speech Pathology CRICOSProviderCode00301J(WA),02637B(NSW) All interested are welcome. This invitation extends to Speech Pathology.Yuen@curtin.edu.au Telephone +61 8 9266 7984 or visit psych.curtin.edu.au School of Psychology & Speech Pathology Monday 15th

  18. Infant-Directed Speech Facilitates Word Segmentation

    ERIC Educational Resources Information Center

    Thiessen, Erik D.; Hill, Emily A.; Saffran, Jenny R.

    2005-01-01

    There are reasons to believe that infant-directed (ID) speech may make language acquisition easier for infants. However, the effects of ID speech on infants' learning remain poorly understood. The experiments reported here assess whether ID speech facilitates word segmentation from fluent speech. One group of infants heard a set of nonsense…

  19. Analysis of False Starts in Spontaneous Speech.

    ERIC Educational Resources Information Center

    O'Shaughnessy, Douglas

    A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

  20. Speech Patterns and Racial Wage Inequality

    ERIC Educational Resources Information Center

    Grogger, Jeffrey

    2011-01-01

    Speech patterns differ substantially between whites and many African Americans. I collect and analyze speech data to understand the role that speech may play in explaining racial wage differences. Among blacks, speech patterns are highly correlated with measures of skill such as schooling and AFQT scores. They are also highly correlated with the…

  1. POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes

    E-print Network

    Bilmes, Jeff

    POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes {hlin,bilmes}@ee.washington.edu Department for speech recognition that consists of multiple semi-synchronized recognizers operating on a polyphase problem in many speech recognition systems ­ i.e., that speech modulation energy is most important below

  2. Emerging Technologies Speech Tools and Technologies

    ERIC Educational Resources Information Center

    Godwin-Jones, Robert

    2009-01-01

    Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

  3. ROBUST SPEECH RECOGNITION K.K. Paliwal

    E-print Network

    ROBUST SPEECH RECOGNITION K.K. Paliwal School of Microelectronic Engineering Griffith University. The aim of ro- bust speech recognition is to overcome the mismatch problem so as to result in a moderate of an automatic speech recognition system, describe sources of speech variability that cause mismatch between

  4. Speech Perception Within an Auditory Cognitive Science

    E-print Network

    Holt, Lori L.

    Speech Perception Within an Auditory Cognitive Science Framework Lori L. Holt1 and Andrew J. Lotto2 speech begins with auditory processing, inves- tigation of speech perception has progressed mostly inde the study of general auditory processing and speech perception, showing that the latter is constrained

  5. Reactions of College Students to Speech Disorders.

    ERIC Educational Resources Information Center

    McKinnon, Shauna L.; And Others

    1986-01-01

    Reactions of 33 college students to audiotaped speech samples of simulated moderate speech disorder of stuttering, hypernasality, and lateral lisping, as well as normal speech were measured. The students reacted to the speech disorders with a tendency of increased social distance in addition to judgments of lower evaluation, lower…

  6. Phonetic Recalibration Only Occurs in Speech Mode

    ERIC Educational Resources Information Center

    Vroomen, Jean; Baart, Martijn

    2009-01-01

    Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

  7. Multifractal nature of unvoiced speech signals

    SciTech Connect

    Adeyemi, O.A. [Department of Electrical Engineering, University of Rhode Island, Kingston, Rhode Island 02881 (United States); Hartt, K. [Department of Physics, University of Rhode Island, Kingston, Rhode Island 02881 (United States); Boudreaux-Bartels, G.F. [Department of Electrical Engineering, University of Rhode Island, Kingston, Rhode Island 02881 (United States)

    1996-06-01

    A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures. {copyright} {ital 1996 American Institute of Physics.}

  8. Tools For Researchand Education In Speech Science

    Microsoft Academic Search

    Ronald A. Cole

    1999-01-01

    The Center for Spoken Language Understanding (CSLU)provides free language resources to researchers and educators inall areas of speech and hearing science. These resources are ofgreat potential value to speech scientists for analyzing speech,for diagnosing and treating speech and language problems, forresearching and evaluating language technologies, and fortraining students in the theory and practice of speech science.This article describes language resources

  9. WEB-BASED EDUCATION: A SPEECH RECOGNITION AND SYNTHESIS TOOL

    E-print Network

    Miles, Will

    WEB-BASED EDUCATION: A SPEECH RECOGNITION AND SYNTHESIS TOOL by LAURA SCHINDLER Advisor DR. HALA............................................................................................. 4 2.2 Speech Recognition............................................................................................. 9 4.4 Speech Recognition

  10. Alignment of speech and co-speech gesture in a constraint-based grammar 

    E-print Network

    Saint-Amand, Katya; Amand, Katya Saint; Alahverdzhieva, Katya

    2013-07-02

    This thesis concerns the form-meaning mapping of multimodal communicative actions consisting of speech signals and improvised co-speech gestures, produced spontaneously with the hand. The interaction between speech and ...

  11. Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation for Speech

    E-print Network

    de Gispert, Adrià

    Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation for Speech Recognition C. K. Raut, Kai Yu and Mark Gales 2007 April 12 Cambridge University Engineering Recognition Overview · Adaptation and Adaptive Training ­ Speech Recognition in Varying Acoustic Conditions

  12. Silog: Speech Input Logon

    NASA Astrophysics Data System (ADS)

    Grau, Sergio; Allen, Tony; Sherkat, Nasser

    Silog is a biometrie authentication system that extends the conventional PC logon process using voice verification. Users enter their ID and password using a conventional Windows logon procedure but then the biometrie authentication stage makes a Voice over IP (VoIP) call to a VoiceXML (VXML) server. User interaction with this speech-enabled component then allows the user's voice characteristics to be extracted as part of a simple user/system spoken dialogue. If the captured voice characteristics match those of a previously registered voice profile, then network access is granted. If no match is possible, then a potential unauthorised system access has been detected and the logon process is aborted.

  13. Speech recovery device

    DOEpatents

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  14. Speech processing: An evolving technology

    SciTech Connect

    Crochiere, R.E.; Flanagan, J.L.

    1986-09-01

    As we enter the information age, speech processing is emerging as an important technology for making machines easier and more convenient for humans to use. It is both an old and a new technology - dating back to the invention of the telephone and forward, at least in aspirations, to the capabilities of HAL in 2001. Explosive advances in microelectronics now make it possible to implement economical real-time hardware for sophisticated speech processing - processing that formerly could be demonstrated only in simulations on main-frame computers. As a result, fundamentally new product concepts - as well as new features and functions in existing products - are becoming possible and are being explored in the marketplace. As the introductory piece to this issue, the authors draw a brief perspective on the evolving field of speech processing and assess the technology in the the three constituent sectors: speech coding, synthesis, and recognition.

  15. Acute stress reduces speech fluency.

    PubMed

    Buchanan, Tony W; Laures-Gore, Jacqueline S; Duff, Melissa C

    2014-03-01

    People often report word-finding difficulties and other language disturbances when put in a stressful situation. There is, however, scant empirical evidence to support the claim that stress affects speech productivity. To address this issue, we measured speech and language variables during a stressful Trier Social Stress Test (TSST) as well as during a less stressful "placebo" TSST (Het et al., 2009). Compared to the non-stressful speech, participants showed higher word productivity during the TSST. By contrast, participants paused more during the stressful TSST, an effect that was especially pronounced in participants who produced a larger cortisol and heart rate response to the stressor. Findings support anecdotal evidence of stress-impaired speech production abilities. PMID:24555989

  16. Speech Generation in Mobile Phones

    Microsoft Academic Search

    Géza Németh; Géza Kiss; Csaba Zainkó; Gábor Olaszy; Bálint Tóth

    Mobile phones became indispensable friends for many people. They are being used in all spaces of life including the car. The\\u000a security risk of this situation has motivated severe regulation of use on one hand and on the other hand, increased attention\\u000a to built-in speech recognition. Far less attention has been paid however to possible advantages of automatic speech generation

  17. Speech and Language Disorders in the School Setting

    MedlinePLUS

    Frequently Asked Questions: Speech and Language Disorders in the School Setting What types of speech and language disorders affect school-age children ? Do speech-language disorders affect learning ? How may a speech- ...

  18. Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1

    E-print Network

    Cortes, Corinna

    Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1 , Kyle Gorman1 recognition, dialec- tal speech recognition, pronunciation modeling, discriminative training 1. Introduction Speech recognition technology is increasingly ubiquitous in ev- eryday life. Automatic speech recognition

  19. Neural pathways for visual speech perception.

    PubMed

    Bernstein, Lynne E; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  20. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  1. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  2. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  3. Speech enhancement using a soft-decision noise suppression filter

    Microsoft Academic Search

    R. McAulay; M. Malpass

    1980-01-01

    One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present)

  4. Production and perception of clear speech

    NASA Astrophysics Data System (ADS)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  5. Concept-based speech-to-speech translation using maximum entropy models for statistical natural concept generation

    Microsoft Academic Search

    Liang Gu; Yuqing Gao; Fu-hua Liu; Michael Picheny

    2006-01-01

    The IBM Multilingual Automatic Speech-To-Speech TranslatOR (MASTOR) system is a research prototype developed for the Defense Advanced Research Projects Agency (DARPA) Babylon\\/CAST speech-to-speech machine translation program. The system consists of cascaded components of large-vocabulary conversational spontaneous speech recognition, statistical machine translation, and concatenative text-to-speech synthesis. To achieve highly accurate and robust conversational spoken language translation, a unique concept-based speech-to-speech translation

  6. Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines

    Microsoft Academic Search

    Thorsten Wesker; Bernd T. Meyer; Kirsten Wagener; Jörn Anemüller; Alfred Mertins; Birger Kollmeier

    2005-01-01

    This paper introduces the new OLdenburg LOgatome speech corpus (OLLO) and outlines design considerations dur- ing its creation. OLLO is distinct from previous ASR corpora as it specifically targets (1) the fair comparison between human and machine speech recognition performance, and (2) the re- alistic representation of intrinsic variabilities in speech that are significant for automatic speech recognition (ASR) systems.

  7. The Contribution of Sensitivity to Speech Rhythm and Non-Speech Rhythm to Early Reading Development

    ERIC Educational Resources Information Center

    Holliman, Andrew J.; Wood, Clare; Sheehy, Kieron

    2010-01-01

    Both sensitivity to speech rhythm and non-speech rhythm have been associated with successful phonological awareness and reading development in separate studies. However, the extent to which speech rhythm, non-speech rhythm and literacy skills are interrelated has not been examined. As a result, five- to seven-year-old English-speaking children…

  8. GMM Mapping Of Visual Features of Cued Speech From Speech Spectral Features

    E-print Network

    Paris-Sud XI, Université de

    the target and the predictor such as the hand positions of Cued Speech and the acoustic speech spectral method based on GMM modeling to map the acoustic speech spectral features to visual features of Cued is largely improving speech perception for deaf people ([2]), relating to the identification of American

  9. Speech anxiety affects how people prepare speeches: A protocol analysis of the preparation processes of speakers

    Microsoft Academic Search

    John A. Daly; Anita L. Vangelisti; David J. Weber

    1995-01-01

    Why does public speaking anxiety lead people to present speeches of judged lower quality? Prior research suggests a number of variables that might detrimentally affect the performance of highly anxious speakers when they present speeches. But does speech anxiety affect only presentation behavior, or does it also affect the ways in which people prepare their speeches? Measures of public speaking

  10. A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

    E-print Network

    Whelan, Paul F.

    A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Abstract. This paper presents the development of a novel visual speech recognition (VSR) system based on a new noting that they are problematic when applied to the continuous visual speech recognition. To circumvent

  11. A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition

    E-print Network

    Sollich, Peter

    1 A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition the individual front-ends across the full range of noise levels. Index Terms--Speech recognition, robustness, subbands, sup- port vector machines. I. INTRODUCTION AUTOMATIC speech recognition (ASR) systems suffer

  12. THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION

    E-print Network

    Johnson, Michael T.

    THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION Kevin M. Indrebo are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear. Introduction Spectral-based acoustic features have been the standard in speech recognition for many years, even

  13. What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W. Massaro

    E-print Network

    Massaro, Dominic

    What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W Abstract We consider the problem of speech recognition given visual and auditory information, and discuss, and third, the use of these production models to help guide automatic speech recognition. Finally, we

  14. Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition

    Microsoft Academic Search

    Kentaro Ishizuka; Noboru Miyazaki

    2004-01-01

    This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters

  15. Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

    ERIC Educational Resources Information Center

    Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

    2014-01-01

    Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

  16. Determining the threshold for usable speech within co-channel speech with the SPHINX automated speech recognition system

    NASA Astrophysics Data System (ADS)

    Hicks, William T.; Yantorno, Robert E.

    2004-10-01

    Much research has been and is continuing to be done in the area of separating the original utterances of two speakers from co-channel speech. This is very important in the area of automated speech recognition (ASR), where the current state of technology is not nearly as accurate as human listeners when the speech is co-channel. It is desired to determine what types of speech (voiced, unvoiced, and silence) and at what target to interference ratio (TIR) two speakers can speak at the same time and not reduce speech intelligibility of the target speaker (referred to as usable speech). Knowing which segments of co-channel speech are usable in ASR can be used to improve the reconstruction of single speaker speech. Tests were performed using the SPHINX ASR software and the TIDIGITS database. It was found that interfering voiced speech with a TIR of 6 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech. It was further found that interfering unvoiced speech with a TIR of 18 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech.

  17. The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Ingham, Roger J.

    2013-01-01

    Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…

  18. A causal test of the motor theory of speech perception: A case of impaired speech production and spared speech perception

    PubMed Central

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.

    2015-01-01

    In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749

  19. Speech synthesis by phonological structure matching. 

    E-print Network

    Taylor, Paul; Black, Alan W

    1999-01-01

    This paper presents a new technique for speech synthesis by unit selection. The technique works by specifying the synthesis target and the speech database as phonological trees, and using a selection algorithm which ...

  20. President Kennedy's Speech at Rice University

    NASA Technical Reports Server (NTRS)

    1988-01-01

    This video tape presents unedited film footage of President John F. Kennedy's speech at Rice University, Houston, Texas, September 12, 1962. The speech expresses the commitment of the United States to landing an astronaut on the Moon.

  1. American Speech-Language-Hearing Association

    MedlinePLUS

    American Speech-Language-Hearing Association (ASHA) Making effective communication, a human right, accessible and achievable for all. Audiologists Speech-Language Pathologists Students Faculty Highlights ? Earn CEUs from ...

  2. Speech for People with Tracheostomies or Ventilators

    MedlinePLUS

    Speech for People With Tracheostomies or Ventilators [ en Español ] What is a tracheostomy ? What happens when a ... choking. What impact does a tracheostomy have on speech? People who have a tracheostomy cannot speak in ...

  3. Speech Recognition Using Augmented Conditional Random Fields 

    E-print Network

    Hifny, Yasser; Renals, Steve

    2009-01-01

    Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal...

  4. Speech Recognition: Its Place in Business Education.

    ERIC Educational Resources Information Center

    Szul, Linda F.; Bouder, Michele

    2003-01-01

    Suggests uses of speech recognition devices in the classroom for students with disabilities. Compares speech recognition software packages and provides guidelines for selection and teaching. (Contains 14 references.) (SK)

  5. Speech Recognition by Machine, A Review

    E-print Network

    Anusuya, M A

    2010-01-01

    This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recog...

  6. LSP quantization in wideband speech coders

    Microsoft Academic Search

    M. Ferhaoui; S. Van Gerven

    1999-01-01

    This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding

  7. Structural Representation of Speech for Phonetic Classification 

    E-print Network

    Gutkin, Alexander; King, Simon

    This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition(ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive ...

  8. Multimodal speech recognition with ultrasonic sensors

    E-print Network

    Zhu, Bo, M. Eng. Massachusetts Institute of Technology

    2008-01-01

    Ultrasonic sensing of articulator movement is an area of multimodal speech recognition that has not been researched extensively. The widely-researched audio-visual speech recognition (AVSR), which relies upon video data, ...

  9. European speech databases for telephone applications

    Microsoft Academic Search

    H. Hoge; H. S. Tropf; R. Winski; H. van den Heuvel; R. Haeb-Umbach; K. Choukri

    1997-01-01

    The SpeechDat project aims to produce speech databases for all official languages of the European Union and some major dialectal variants and minority languages resulting in 28 speech databases. They will be recorded over fixed and mobile telephone networks. This will provide a realistic basis for training and assessment of both isolated and continuous-speech utterances, employing whole-word or subword approaches,

  10. Free Speech, Equal Opportunity, and Justice

    Microsoft Academic Search

    Alistair M. Macleod

    \\u000a After distinguishing questions about (a) the reasons for the value we attach to various forms of freedom of speech, (b) the\\u000a grounds of the moral right to freedom of speech, and (c) the role of the state as a guarantor of freedom of speech, I argue\\u000a (1) that the reasons for valuing freedom of speech have force only in certain

  11. Coevolution of human speech and trade

    Microsoft Academic Search

    Richard D. Horan; Erwin H. Bulte; Jason F. Shogren

    2008-01-01

    We propose a paleoeconomic coevolutionary explanation for the origin of speech in modern humans. The coevolutionary process,\\u000a in which trade facilitates speech and speech facilitates trade, gives rise to multiple stable trajectories. While a ‘trade-speech’\\u000a equilibrium is not an inevitable outcome for modern humans, we find it is a relatively likely scenario given our species evolved\\u000a in Africa under climatic

  12. The use of automatic speech recognition showing the influence of nasality on speech intelligibility

    Microsoft Academic Search

    S. Mayr; K. Burkhardt; M. Schuster; K. Rogler; A. Maier; H. Iro

    2010-01-01

    Altered nasality influences speech intelligibility. Automatic speech recognition (ASR) has proved suitable for quantifying\\u000a speech intelligibility in patients with different degrees of nasal emissions. We investigated the influence of hyponasality\\u000a on the results of speech recognition before and after nasal surgery using ASR. Speech recordings, nasal peak inspiratory flow\\u000a and self-perception measurements were carried out in 20 German-speaking patients (8

  13. Trends and Directions in Community College Speech.

    ERIC Educational Resources Information Center

    Berko, Roy M.

    In this paper the author discusses current practices in speech education courses taught on the junior college level. He examines specific problems that face speech educators who teach in public two-year colleges. The author attributes weakness of speech programs to: (1) inadequate facilities; (2) insufficient full-time staff; (3) oversized…

  14. DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

    ERIC Educational Resources Information Center

    KARLIN, ISAAC W.; AND OTHERS

    THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…

  15. Speech-Language-Hearing Department of Communication

    E-print Network

    New Hampshire, University of

    faculty at the UNH Speech- Language-Hearing Center. She considers herself a generalist, with a particular faculty at the UNH Speech-Language-Hearing Center. Her areas of expertise include early intervention of experience as a practicing clinician to her role as clinical faculty at the UNH Speech-Language

  16. Graduate Student SPEECH-LANGUAGE PATHOLOGY

    E-print Network

    Long, Nicholas

    1 Graduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY P;2 Dear Prospective student: Welcome to the Program website for Speech-Language Pathology and Audiology in Stephen F Austin State University. The field of Speech-Language Pathology and Audiology is concerned

  17. Comparative experiments on large vocabulary speech recognition

    Microsoft Academic Search

    Richard Schwartz; Tasos Anastasakos; Francis Kubala; John Makhoul; Long Nguyen; George Zavaliagkos

    1993-01-01

    This paper describes several key experiments in large vocabulary speech recognition. We demonstrate that, counter to our intuitions, given a fixed amount of training speech, the number of training speakers has little effect on the accuracy. We show how much speech is needed for speaker-independent (SI) recognition in order to achieve the same performance as speaker-dependent (SD) recognition. We demonstrate

  18. Interventions for Speech Sound Disorders in Children

    ERIC Educational Resources Information Center

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  19. Listener Effort for Highly Intelligible Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Nagle, Kathy F.; Eadie, Tanya L.

    2012-01-01

    The purpose of this study was to determine whether: (a) inexperienced listeners can reliably judge listener effort and (b) whether listener effort provides unique information beyond speech intelligibility or acceptability in tracheoesophageal speech. Twenty inexperienced listeners made judgments of speech acceptability and amount of effort…

  20. Microsoft Windows highly intelligent speech recognizer: Whisper

    Microsoft Academic Search

    Xuedong Huang; Alex Acero; Fil Alleva; Mei-Yuh Hwang; Li Jiang; Milind Mahajan

    1995-01-01

    Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers speech input capabilities for

  1. Communicating by Language: The Speech Process.

    ERIC Educational Resources Information Center

    House, Arthur S., Ed.

    This document reports on a conference focused on speech problems. The main objective of these discussions was to facilitate a deeper understanding of human communication through interaction of conference participants with colleagues in other disciplines. Topics discussed included speech production, feedback, speech perception, and development of…

  2. Speech Recognition Experiments Silicon Auditory Models

    E-print Network

    Lazzaro, John

    Speech Recognition Experiments with Silicon Auditory Models John Lazzaro and John Wawrzynek CS the performance of this speech recognition system on a speaker-independent 13-word recognition task. 1 in this difference. Current engineering applications of auditory models under study include speech recognition

  3. Speech recognition using noise-adaptive prototypes

    Microsoft Academic Search

    ARTHUR NADAS; DAVID NAHAMOO; MICHAEL A. PICHENY

    1989-01-01

    A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition. Each component of the mixture is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system. Since the ambient noise during recognition can differ from that present in the training data, the

  4. Algorithmic Aspects in Speech Recognition: An Introduction

    E-print Network

    Buchsbaum, Adam

    Algorithmic Aspects in Speech Recognition: An Introduction Adam L. Buchsbaum AT&T Labs, Florham Machinery, Inc., 1515 Broadway, New York, NY 10036, USA, Tel: (212) 869-7440 Speech recognition is an area recognition. This paper presents the field of speech recognition and describes some of its major open problems

  5. GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION

    E-print Network

    Bilmes, Jeff

    GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION JEFFREY A. BILMES Abstract. Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition as part of a speech recognition system can be described by a graph ­ this includes Gaussian dis

  6. Robust Speech Recognition Using Articulatory Information

    E-print Network

    Kirchhoff, Katrin

    Robust Speech Recognition Using Articulatory Information Der Technischen FakultË? at der Universit­up acoustic modeling component in a speech recognition system. The second focus point of this thesis different speech recognition tasks. The first of these is an American English corpus of telephone

  7. Regularizing Linear Discriminant Analysis for Speech Recognition

    E-print Network

    Erdogan, Hakan

    Regularizing Linear Discriminant Analysis for Speech Recognition Hakan Erdogan Faculty in a pattern recognition system is the feature extractor. Feature extraction is an important step for speech recognition since the time-domain speech signal is highly variable, thus complex linear and nonlinear

  8. GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS

    E-print Network

    Allauzen, Cyril

    GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large- vocabulary speech recognition

  9. Speech recognition with amplitude and frequency modulations

    E-print Network

    Allen, Jont

    Speech recognition with amplitude and frequency modulations Fan-Gang Zeng* , Kaibao Nie*, Ginger S, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived number of spectral bands may be sufficient for speech recognition in quiet, FM significantly en- hances

  10. GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS

    E-print Network

    Mohri, Mehryar

    GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large­ vocabulary speech recognition

  11. HIDDENARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION

    E-print Network

    Bilmes, Jeff

    HIDDEN­ARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION Matt Richardson, Jeff Bilmes and Chris speech recognition using Hidden Markov Models (HMMs), each state represents an acoustic portion assist speech recognition. We demonstrate this by showing that our mapping of articulatory configurations

  12. Off-Campus, Harmful Online Student Speech.

    ERIC Educational Resources Information Center

    Willard, Nancy

    2003-01-01

    Discusses issues related to off-campus, harmful student speech on the Internet, exploring the characteristics of this harmful speech and reviewing recent court cases in which schools have perceived such speech to be harmful or defamatory. Discusses prevention strategies in the context of the social and behavioral factors involved in offensive…

  13. Audiovisual Asynchrony Detection in Human Speech

    ERIC Educational Resources Information Center

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  14. Information Bottleneck and HMM for Speech Recognition

    E-print Network

    Friedman, Nir

    speech. The speaker speaks differently if he is happy or mad, frustrated or disappointed. Men and womenInformation Bottleneck and HMM for Speech Recognition Thesis submitted for the degree of "Master for his help and cooperation in this work. #12;Contents 1 Introduction 4 2 Speech Processing 6 2

  15. An LSP based speech quality measure

    Microsoft Academic Search

    H. J. Coetzee

    1989-01-01

    An objective measure is introduced for predicting the subjective quality of a distorted speech signal. The unique feature of the measure is that it explicitly uses a parametric model of speech perception (rather than a signal fidelity measure) in the design of an objective measure for speech quality. Specifically, the known sensitivity of human auditory perception to perturbations in the

  16. Pitch and Duration Modification for Speech Watermarking

    E-print Network

    Sharma, Gaurav

    Pitch and Duration Modification for Speech Watermarking Mehmet Celik, Gaurav Sharma and A. Murat of Engineering, Koc University, Istanbul, Turkey. Abstract-- We propose a speech watermarking algorithm based variability of these speech features allows watermarking modifications to be imperceptible to the human

  17. Intelligibility of Speech Produced during Simultaneous Communication

    ERIC Educational Resources Information Center

    Whitehead, Robert L.; Schiavetti, Nicholas; MacKenzie, Douglas J.; Metz, Dale Evan

    2004-01-01

    This study investigated the overall intelligibility of speech produced during simultaneous communication (SC). Four hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking Boothroyd's (1985) forced-choice phonetic contrast material designed for measurement of speech intelligibility. Twelve…

  18. Speech Act Pluralism So far we have

    E-print Network

    Pylyshyn, Zenon

    the nature of speech acts, we start by outlining, in general terms, how we think speech acts should an investigation to uncover.1 Intuitions and nontheoretic assumptions about speech act content can, of course this is sort of a comedy of errors, bizarre, without getting into it, `the president believes that it is going

  19. Speech Perception in Individuals with Auditory Neuropathy

    ERIC Educational Resources Information Center

    Zeng, Fan-Gang; Liu, Sheng

    2006-01-01

    Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…

  20. Norwegian Speech Recognition for Telephone Applications

    E-print Network

    Amdal, Ingunn

    Norwegian Speech Recognition for Telephone Applications Harald Lj en Ingunn Amdal Finn Tore of them over telephone lines. The most im- portant single factor for successful speech recogni- tion seems telephone network. General public services in the telephone network present high challenges to speech

  1. Speech and Hearing Science, Anatomy and Physiology.

    ERIC Educational Resources Information Center

    Zemlin, Willard R.

    Written for those interested in speech pathology and audiology, the text presents the anatomical, physiological, and neurological bases for speech and hearing. Anatomical nomenclature used in the speech and hearing sciences is introduced and the breathing mechanism is defined and discussed in terms of the respiratory passage, the framework and…

  2. Coevolutionary Investments in Human Speech and Trade

    Microsoft Academic Search

    Erwin H. Bulte; Richard D. Horan; Jason F. Shogren

    2006-01-01

    We propose a novel explanation for the emergence of language in modern humans, and the lack thereof in other hominids. A coevolutionary process, where trade facilitates speech and speech facilitates trade, driven by expectations and potentially influenced by geography, gives rise to multiple stable development trajectories. While the trade-speech equilibrium is not an inevitable outcome for modern humans, we do

  3. Auditory models for speech analysis

    NASA Astrophysics Data System (ADS)

    Maybury, Mark T.

    This paper reviews the psychophysical basis for auditory models and discusses their application to automatic speech recognition. First an overview of the human auditory system is presented, followed by a review of current knowledge gleaned from neurological and psychoacoustic experimentation. Next, a general framework describes established peripheral auditory models which are based on well-understood properties of the peripheral auditory system. This is followed by a discussion of current enhancements to that models to include nonlinearities and synchrony information as well as other higher auditory functions. Finally, the initial performance of auditory models in the task of speech recognition is examined and additional applications are mentioned.

  4. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  5. NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database

    Microsoft Academic Search

    C. Jankowski; A. Kalyanswamy; S. Basson; J. Spitz

    1990-01-01

    The creation of the network TIMIT (NTIMIT) database, which is the result of transmitting the TIMIT database over the telephone network, is described. A brief description of the TIMIT database is given, including characteristics useful for speech analysis and recognition. The hardware and software required for the transmission of the database is described. The geographic distribution of the TIMIT utterances

  6. Pulse Vector-Excitation Speech Encoder

    NASA Technical Reports Server (NTRS)

    Davidson, Grant; Gersho, Allen

    1989-01-01

    Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.

  7. Adaptive Redundant Speech Transmission over Wireless Multimedia Sensor Networks Based on Estimation of Perceived Speech Quality

    PubMed Central

    Kang, Jin Ah; Kim, Hong Kook

    2011-01-01

    An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs. PMID:22164086

  8. Speech Perception Dominic W. Massaro

    E-print Network

    Massaro, Dominic

    . This mental organ, responsible for the human language faculty and our language competence, matures is congruent with the more prominent belief that language and how it is acquired is special. 1 Speech is Special A highly influential proposal by Noam Chomsky envisioned language ability as dependent

  9. BYBLOS Speech Recognition Benchmark Results

    Microsoft Academic Search

    Francis Kubala; Steve Austin; Chris Barry; John Makhoul; P. Placeway; Richard M. Schwartz

    1991-01-01

    This paper presents speech recognition test results from the BBN BYBLOS system on the Feb 91 DARPA benchmarks in both the Resource Management (RM) and the Air Travel Information System (ATIS) domains. In the RM test, we report on speaker-independent (SI) recognition performance for the standard training condition using 109 speakers and for our recently proposed SI model made from

  10. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  11. Government Speech 2.0

    Microsoft Academic Search

    Helen Norton; Danielle Keats Citron

    2010-01-01

    New expressive technologies continue to transform the ways in which members of the public speak to one another. Not surprisingly, emerging technologies have changed the ways in which government speaks as well. Despite substantial shifts in how the government and other parties actually communicate, however, the Supreme Court to date has developed its government speech doctrine – which recognizes “government

  12. “Eigenlips” for robust speech recognition

    Microsoft Academic Search

    Christoph Bregler; Yochai Konig

    1994-01-01

    We improve the performance of a hybrid connectionist speech recognition system by incorporating visual information about the corresponding lip movements. Specifically, we investigate the benefits of adding visual features in the presence of additive noise and crosstalk (cocktail party effect). Our study extends our previous experiments by using a new visual front end, and an alternative architecture for combining the

  13. Speech Processing Background November 1998

    E-print Network

    Roweis, Sam

    toy dog. The #12;rst device I am aware of to use speech recognition. A description from David and Selfridge: \\It consisted of a celluloid dog with an iron base held within its house by an electromagnet, interrupting the current and releasing the dog. The energy around 500Hzcontained in the vowel of the word Rex

  14. Speech Processing Background November 1998

    E-print Network

    Roweis, Sam

    Radio Rex toy dog. The #12;rst device I am aware of to use speech recognition. A description from David and Selfridge: \\It consisted of a celluloid dog with an iron base held within its house by an electromagnet, interrupting the current and releasing the dog. The energy around 500Hzcontained in the vowel of the word Rex

  15. Perception of the speech code

    Microsoft Academic Search

    A. M. Liberman; F. S. Cooper; D. P. Shankweiler; M. Studdert-Kennedy

    1967-01-01

    Man could not perceive speech well if each phoneme were cued by a unit sound. In fact, many phonemes are encoded so that a single acoustic cue carries information in parallel about successive phonemic segments. This reduces the rate at which discrete sounds must be perceived, but at the price of a complex relation between cue and phoneme: cues vary

  16. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ...relay service (VRS), Internet Protocol Relay (IP Relay...speech disabilities can access the telephone system...of STS that utilizes Internet- based transmissions...calls on a mobile or Internet-enabled device, by...interconnected VoIP service to access a state STS relay...

  17. Speech and Language Developmental Milestones

    MedlinePLUS

    ... ways in which the brain is influenced by health conditions or life experiences—and how it can be used to develop learning strategies that encourage healthy language and speech development in early childhood. A recent workshop convened by the NIDCD drew ...

  18. Changing Speech Styles: Strategies in Read Speech and Casual and Careful Spontaneous Speech.

    ERIC Educational Resources Information Center

    Eskenazi, Maxine

    A study examined segmental and suprasegmental elements which contribute to an impression of one speaking style as opposed to another. A corpus containing three styles of speech, casual, careful, and read, for the same linguistic content was gathered. Thirteen speakers from Paris, France (aged 24-35) were given a scenario to be acted out over the…

  19. Noise adaptive speech recognition based on sequential noise parameter estimation

    E-print Network

    Noise adaptive speech recognition based on sequential noise parameter estimation Kaisheng Yao a In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition

  20. Women's and Men's Ratings of Their Own and Ideal Speech.

    ERIC Educational Resources Information Center

    Kramer, Cheris

    1978-01-01

    A study comparing women's and men's ratings of their own and ideal speech showed that a greater number of speech characteristics of males differed from the speech characteristics of the ideal speaker. Women are advised to consider the desirable characteristics associated with female speech before altering their speech by such means as…

  1. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  2. The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

    ERIC Educational Resources Information Center

    Adank, Patti

    2012-01-01

    The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…

  3. On the use of prosody in a speech-to-speech translator. 

    E-print Network

    Strom, Volker; Elsner, Anja; Hess, Wolfgang; Kasper, Walter; Klein, Alexandra; Kreiger, Hans Ulrich; Spilker, Jorg; Weber, Hans; Görz, Gunther

    1997-01-01

    In this paper a speech-to-speech translator from German to English is presented. Beside the traditional processing steps it takes advantage of acoustically detected prosodic phrase boundaries and focus. The prosodic phrase ...

  4. Orofacial muscle activity during inner speech and auditory verbal hallucinations: implications for a speech control model

    E-print Network

    Boyer, Edmond

    Orofacial muscle activity during inner speech and auditory verbal hallucinations: implications (EMG). Subtle EMG activity has been detected in the speech musculature during verbal mental imagery of studies indicated an orofacial muscular activity during AVHs; however, the findings were

  5. Primary Progressive Aphasia and Apraxia of Speech

    PubMed Central

    Jung, Youngsin; Duffy, Joseph R.; Josephs, Keith A.

    2014-01-01

    Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: non-fluent/agrammatic, semantic, and logopenic variants of primary progressive aphasia. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. Here, we review clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech. The distinctions among these disorders will be crucial since accurate diagnosis will be important from a prognostic and therapeutic standpoint. PMID:24234355

  6. Speechalator: Two-Way Speech-to-Speech Translation in Your Hand

    Microsoft Academic Search

    Alex Waibel; Ahmed Badran; Alan W. Black; Robert E. Frederking; Donna Gates; Alon Lavie; Lori S. Levin; Kevin Lenzo; Laura Mayfield Tomokiyo; Juergen Reichert; Tanja Schultz; Dorcas Wallace; Monika Woszczyna; Jing Zhang

    2003-01-01

    This demonstration involves two-way automatic speech-to-speech translation on a consumer off-the-shelf PDA. This work was done as part of the DARPA-funded Babylon project, investigating better speech-to-speech translation systems for communication in the field. The development of the Speechalator software-based translation system required addressing a number of hard issues, including a new language for the team (Egyptian Arabic), close integration on

  7. Developing high performance asr in the IBM multilingual speech-to-speech translation system

    Microsoft Academic Search

    Xiaodong Cui; Liang Gu; Bing Xiang; Wei Zhang; Yuqing Gao

    2008-01-01

    This paper presents our recent development of the real-time speech recognition component in the IBM English\\/Iraqi Arabic speech-to-speech translation system for the DARPA Transtac project. We describe the details of the acoustic and language modeling that lead to high recognition accuracy and noise robustness and give the performance of the system on the evaluation sets of spontaneous conversational speech. We

  8. Apraxia of speech: an overview.

    PubMed

    Ogar, Jennifer; Slama, Hilary; Dronkers, Nina; Amici, Serena; Gorno-Tempini, Maria Luisa

    2005-12-01

    Apraxia of speech (AOS) is a motor speech disorder that can occur in the absence of aphasia or dysarthria. AOS has been the subject of some controversy since the disorder was first named and described by Darley and his Mayo Clinic colleagues in the 1960s. A recent revival of interest in AOS is due in part to the fact that it is often the first symptom of neurodegenerative diseases, such as primary progressive aphasia and corticobasal degeneration. This article will provide a brief review of terminology associated with AOS, its clinical hallmarks and neuroanatomical correlates. Current models of motor programming will also be addressed as they relate to AOS and finally, typical treatment strategies used in rehabilitating the articulation and prosody deficits associated with AOS will be summarized. PMID:16393756

  9. Headphone localization of speech stimuli

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1991-01-01

    Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  10. Language processing for speech understanding

    NASA Astrophysics Data System (ADS)

    Woods, W. A.

    1983-07-01

    This report considers language understanding techniques and control strategies that can be applied to provide higher-level support to aid in the understanding of spoken utterances. The discussion is illustrated with concepts and examples from the BBN speech understanding system, HWIM (Hear What I Mean). The HWIM system was conceived as an assistant to a travel budget manager, a system that would store information about planned and taken trips, travel budgets and their planning. The system was able to respond to commands and answer questions spoken into a microphone, and was able to synthesize spoken responses as output. HWIM was a prototype system used to drive speech understanding research. It used a phonetic-based approach, with no speaker training, a large vocabulary, and a relatively unconstraining English grammar. Discussed here is the control structure of the HWIM and the parsing algorithm used to parse sentences from the middle-out, using an ATN grammar.

  11. Adding Speech, Language, and Hearing Benefits to Your Policy

    MedlinePLUS

    Adding Speech, Language, and Hearing Benefits to Your Policy Introduction Why add speech, language, and hearing benefits? What are the costs? ... insurers, and labor unions-with information on adding speech, language, and hearing benefits to your health insurance ...

  12. Robust speech recognition from binary masks Arun Narayanana)

    E-print Network

    Wang, DeLiang "Leon"

    Robust speech recognition from binary masks Arun Narayanana) Department of Computer Science may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed

  13. Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies

    E-print Network

    Johnson, Michael T.

    Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies Marek B, or squared distance, before passing the enhanced single-channel signal into the speech recognition system contained in the signals, speech recognition systems can achieve higher recognition accuracies. 1

  14. COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE

    E-print Network

    Ellis, Dan

    COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE Dan Ellis of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while approaches will require more radical adaptation of current speech recognition approaches. 1. INTRODUCTION

  15. Applications of broad class knowledge for noise robust speech recognition

    E-print Network

    Sainath, Tara N

    2009-01-01

    This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. ...

  16. HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION

    E-print Network

    Sollich, Peter

    HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION Matthew Ager , Zoran Cvetkovi-- acoustic waveforms, phoneme, classification, robust, speech recognition 1. INTRODUCTION Many studies have shown that automatic speech recognition (ASR) systems still lack performance when compared to human

  17. SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan

    E-print Network

    Erdogan, Hakan

    SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan Faculty of Engineering vectors. For speech recognition, N is usually prohibitively high increasing com- putational requirements version of KDA that enables its application to speech recognition, thus conveniently enabling nonlinear

  18. LPC parameters substitution for speech information hiding

    Microsoft Academic Search

    Zhi-jun WU; Wei GAO; Wei YANG

    2009-01-01

    Information hiding techniques adopted in secret communication should meet the requirements of high hiding capacity, real-time and high robustness. For the purpose of real-time speech secure communication, an information hiding algorithm based on the analysis-by-synthesis (ABS) speech coding scheme is presented in this article. This algorithm substitutes secret speech data bits for linear predictive coefficients (LPCs) in linear predictive coding

  19. SPEECH INVERSE FILTERING BY SIMULATED ANNEALING ALGORITHM

    Microsoft Academic Search

    Chang-Shiann Wu; Yu-Fu Hsieh

    The purpose of this study is to develop one solution to the speech inverse filtering problem. A new efficient articulatory speech analysis scheme, identifying the articulatory parameters from the acoustic speech waveforms, was induced. The algorithm is known as simulated annealing, which is constrained to avoid non-unique solutions and local minima problems. The constraints are determined by the articulatory-to-acoustic transformation

  20. Phonetically sensitive discriminants for improved speech recognition

    Microsoft Academic Search

    G. R. Doddington

    1989-01-01

    A phonetically sensitive transformation of speech features has yielded significant improvement in speech-recognition performance. This (linear) transformation of the speech feature vector is designed to discriminate against out-of-class confusion data and is a function of phonetic state. Evaluation of the technique on the TI\\/NBS connected digit database demonstrates word (sentence) error rates of 0.5% (1.5%) for unknown-length strings and 0.2%

  1. Spotlight on Speech Codes 2007: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2007

    2007-01-01

    Last year, the Foundation for Individual Rights in Education (FIRE) conducted its first-ever comprehensive study of restrictions on speech at America's colleges and universities, "Spotlight on Speech Codes 2006: The State of Free Speech on our Nation's Campuses." In light of the essentiality of free expression to a truly liberal education, its…

  2. Single and multichannel enhancement of distant speech using characteristics of speech production

    Microsoft Academic Search

    B. Yegnanarayana; S. Guruprasad; S. R. Mahadeva Prasanna; Suryakanth V Gangashetty

    2011-01-01

    Speech collected at a distance from a speaker is degraded due to background noise, reverberation and other audio\\/speech signals. Normally enhancement of the degraded signal focuses on determining the characteristics of degradation and auditory perception. There are many characteristics of speech production which contribute to robustness against degradation caused by distance between the speaker and the pickup microphone. One such

  3. Vocoders and Speech Perception: Uses of Computer-Based Speech Analysis-Synthesis in Stimulus Generation.

    ERIC Educational Resources Information Center

    Tierney, Joseph; Mack, Molly

    1987-01-01

    Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…

  4. Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

    ERIC Educational Resources Information Center

    Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

    2009-01-01

    Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…

  5. Speaking of Speech with the Disciplines: Collaborative Discussions about Collaborative Speech

    ERIC Educational Resources Information Center

    Compton, Josh

    2010-01-01

    As Lecturer of Speech in the Institute for Writing and Rhetoric at Dartmouth College, I have joined an ongoing conversation about speech that spans disciplines. This article takes a step back from looking at communication across the curriculum as a program and instead looks at one of the earliest stages of the process--conversations about speech

  6. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave

    Microsoft Academic Search

    B. S. Atal; SUZANNE L. HANAUER

    1971-01-01

    We describe a procedure for efficient encoding of the speech wave by representing it in terms of time-varying parameters related to the transfer function of the vocal tract and the characteristics of the excitation. The speech wave, sampled at 10 kHz, is analyzed by predicting the present speech sample as a linear combination of the 12 previous samples. The 12

  7. Stability and Composition of Functional Synergies for Speech Movements in Children with Developmental Speech Disorders

    ERIC Educational Resources Information Center

    Terband, H.; Maassen, B.; van Lieshout, P.; Nijland, L.

    2011-01-01

    The aim of this study was to investigate the consistency and composition of functional synergies for speech movements in children with developmental speech disorders. Kinematic data were collected on the reiterated productions of syllables spa(/spa[image omitted]/) and paas(/pa[image omitted]s/) by 10 6- to 9-year-olds with developmental speech

  8. Speed and Accuracy of Rapid Speech Output by Adolescents with Residual Speech Sound Errors Including Rhotics

    ERIC Educational Resources Information Center

    Preston, Jonathan L.; Edwards, Mary Louise

    2009-01-01

    Children with residual speech sound errors are often underserved clinically, yet there has been a lack of recent research elucidating the specific deficits in this population. Adolescents aged 10-14 with residual speech sound errors (RE) that included rhotics were compared to normally speaking peers on tasks assessing speed and accuracy of speech

  9. Spotlight on Speech Codes 2012: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2012

    2012-01-01

    The U.S. Supreme Court has called America's colleges and universities "vital centers for the Nation's intellectual life," but the reality today is that many of these institutions severely restrict free speech and open debate. Speech codes--policies prohibiting student and faculty speech that would, outside the bounds of campus, be protected by the…

  10. Construction of a Rated Speech Corpus of L2 Learners' Spontaneous Speech

    ERIC Educational Resources Information Center

    Yoon, Su-Youn; Pierce, Lisa; Huensch, Amanda; Juul, Eric; Perkins, Samantha; Sproat, Richard; Hasegawa-Johnson, Mark

    2009-01-01

    This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the Speaking…

  11. Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition

    E-print Network

    Virtanen, Tuomas

    as by using automatic speech recognition. Experiments on the PASCAL CHiME challenge corpus, which contains53 Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition Jort F. Gemmeke1 , Tuomas Virtanen2 , Antti Hurmalainen2 1 Department ESAT, Katholieke

  12. Speech Content Integrity Verification Integrated with ITU G.723.1 Speech Coding

    Microsoft Academic Search

    Chung-ping Wu; C. C. Jay Kuo

    2001-01-01

    A speech content integrity verification scheme integrated with ITU G.723.1 speech coding to minimize the total com- putational cost is proposed in this research. Speech fea- tures relevant to the semantic meaning are extracted, en- crypted and attached as the header information. This scheme is not only much faster than cryptographic bitstream integrity algorithms, but also more compatible with a

  13. A Self-Transcribing Speech Corpus: Collecting Continuous Speech with an Online Educational Game

    Microsoft Academic Search

    Alexander Gruenstein; Ian McGraw; Andrew Sutherland

    We describe a novel approach to collecting orthographically transcribed continuous speech data through the use of an online educational game called Voice Scatter, in which players study flashcards by using speech to match terms with their definitions. We analyze a corpus of 30,938 utterances, totaling 27.63 hours of speech, collected during the first 22 days that Voice Scat- ter was

  14. Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition

    Microsoft Academic Search

    Roy D. Patterson; Thomas C. Walters; Jessica Monaghan; Christian Feldbauer; Toshio Irino

    2010-01-01

    The syllables of speech contain information about the vocal tract length (VTL) of the speaker as well as the phonetic message. Ideally, the pre-processor used for automatic speech recognition (ASR) should segregate the phonetic message from the VTL information. This paper describes a method to calculate VTL-invariant auditory feature vectors from speech, using a method in which the message and

  15. Coupling particle filters with automatic speech recognition for speech feature enhancement

    Microsoft Academic Search

    Friedrich Faubel; Matthias Wölfel

    2006-01-01

    This paper addresses robust speech feature extraction in combina- tion with statistical speech feature enhancement and couples the particle filter to the speech recognition hypotheses. To extract noise robust features the Fourier transformation is replaced by the warped and scaled minimum variance distortion- less response spectral envelope. To enhance the features, particle filtering has been used. Further, we show that

  16. Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain1

    E-print Network

    Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech an approach of using lexical stress mod- els to improve the speech recognition performance on sponta- neous with lexical stress on a large corpus of spontaneous utterances, and identified the most informative features

  17. Private and Inner Speech and the Regulation of Social Speech Communication

    ERIC Educational Resources Information Center

    San Martin Martinez, Conchi; Boada i Calbet, Humbert; Feigenbaum, Peter

    2011-01-01

    To further investigate the possible regulatory role of private and inner speech in the context of referential social speech communications, a set of clear and systematically applied measures is needed. This study addresses this need by introducing a rigorous method for identifying private speech and certain sharply defined instances of inaudible…

  18. NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION

    E-print Network

    Virtanen, Tuomas

    NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech robustness, automatic speech recognition, non-negative matrix factorization, speech enhancement 1

  19. Perceiving non-native speech.

    PubMed

    Bürki-Cohen, J; Miller, J L; Eimas, P D

    2001-06-01

    In a series of experiments using monosyllabic words produced by a native and a non-native speaker of English, native English speakers monitored the word-initial consonants of the words to decide which of two consonants was present on each trial. In some of the experiments, a secondary task of a linguistic nature, deciding whether the target-bearing word was a noun or verb, was also required. When the words were presented in silence, the native and non-native stimuli were processed in a like manner. Specifically, when the secondary task was not required, phonemic decisions tended to be made on the basis of prelexical information, whereas when the secondary task was required, they tended to be made on the basis of postlexical information (see Eimas, Marcovitz Hornstein, & Payton, 1990). However, when the listening conditions were degraded by presenting the words at a lower level and in noise, the two types of stimuli yielded different patterns. Native speech was processed as before, whereas for non-native speech phonemic decisions now tended to be made on the basis of postlexical information both when a secondary task was required and when it was not. The contrasting results for native and non-native speech are discussed in terms of models of phoneme processing. PMID:11575902

  20. Adaptation to spectrally-rotated speech.

    PubMed

    Green, Tim; Rosen, Stuart; Faulkner, Andrew; Paterson, Ruth

    2013-08-01

    Much recent interest surrounds listeners' abilities to adapt to various transformations that distort speech. An extreme example is spectral rotation, in which the spectrum of low-pass filtered speech is inverted around a center frequency (2 kHz here). Spectral shape and its dynamics are completely altered, rendering speech virtually unintelligible initially. However, intonation, rhythm, and contrasts in periodicity and aperiodicity are largely unaffected. Four normal hearing adults underwent 6?h of training with spectrally-rotated speech using Continuous Discourse Tracking. They and an untrained control group completed pre- and post-training speech perception tests, for which talkers differed from the training talker. Significantly improved recognition of spectrally-rotated sentences was observed for trained, but not untrained, participants. However, there were no significant improvements in the identification of medial vowels in /bVd/ syllables or intervocalic consonants. Additional tests were performed with speech materials manipulated so as to isolate the contribution of various speech features. These showed that preserving intonational contrasts did not contribute to the comprehension of spectrally-rotated speech after training, and suggested that improvements involved adaptation to altered spectral shape and dynamics, rather than just learning to focus on speech features relatively unaffected by the transformation. PMID:23927133

  1. Sorin Dusan and Lawrence R. Rabiner In Trends in Speech Technology, C. Burileanu (Ed.), Romanian Academic Publisher 21 CAN AUTOMATIC SPEECH RECOGNITION LEARN MORE FROM HUMAN SPEECH

    E-print Network

    Allen, Jont

    Academic Publisher 21 CAN AUTOMATIC SPEECH RECOGNITION LEARN MORE FROM HUMAN SPEECH PERCEPTION? Sorin DUSAN of progress has been made during the last two decades in automatic speech recognition (ASR), the performance in automatic speech recognition appears to have reached a plateau in the past few years. New techniques

  2. The Functional Anatomy of Speech Processing: From Auditory Cortex to Speech Recognition and Speech Production

    Microsoft Academic Search

    Gregory Hickok

    \\u000a Lesion-based research has been successful in providing a broad outline of the neuroanatomy of speech\\/language processes (Dronkers\\u000a et al. 2000; Hillis 2007), and continues to play a crucial role in the development of functional anatomic models of cognitive\\u000a processes (Fellows et al. 2005). However, lesion studies lack the spatial resolution to assess more detailed functional anatomical\\u000a hypotheses. Functional imaging methods

  3. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438

  4. Prediction and constraint in audiovisual speech perception.

    PubMed

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390

  5. The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

    PubMed

    Rimmele, Johanna M; Zion Golumbic, Elana; Schröger, Erich; Poeppel, David

    2015-07-01

    Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically if it reflects enhanced processing of the fine structure of attended speech. To investigate this question we compared attentional effects on speech-tracking of natural versus vocoded speech which preserves the temporal envelope but removes the fine structure of speech. Pairs of natural and vocoded speech stimuli were presented concurrently and participants attended to one stimulus and performed a detection task while ignoring the other stimulus. We recorded magnetoencephalography (MEG) and compared attentional effects on the speech-tracking response in auditory cortex. Speech-tracking of natural, but not vocoded, speech was enhanced by attention, whereas neural tracking of ignored speech was similar for natural and vocoded speech. These findings suggest that the more precise speech-tracking of attended natural speech is related to processing its fine structure, possibly reflecting the application of higher-order linguistic processes. In contrast, when speech is unattended its fine structure is not processed to the same degree and thus elicits less precise speech-tracking more similar to vocoded speech. PMID:25650107

  6. Aphasia and Speech Organization in Children

    Microsoft Academic Search

    Randy L. Carter; Miles K. Hohenegger; Paul Satz

    1982-01-01

    A long-standing controversy concerns whether lateralized cerebral specialization for speech and language is present at the time of language origins (developmental invariance) or whether it gradually develops from initial bilaterality (developmental progression). This controversy is complicated by conflicting reports of the incidence of childhood aphasia. The discrepancies are largely due to one early study. When methods for estimating speech organization

  7. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  8. Toward the Automatic Generation of Cued Speech

    Microsoft Academic Search

    Maroula S. Bratakos; Paul Duchnowski; Louis D. Braida

    1998-01-01

    Although Manual Cued Speech (MCS) can greatly facilitate both education and communication for the deaf, its use is limited to situations in which the talker, or a transliterator, is able to produce cues for the cue receiver. The availability of automatically produced cues would substantially relax this restriction. However, it is unclear whether current automatic speech recognition (ASR) technology would

  9. Structural representation of speech for phonetic classification

    Microsoft Academic Search

    Alexander Gutkin; Simon King

    2004-01-01

    This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive features which is a linguistically well-motivated alternative to the \\

  10. Structural Representation of Speech for Phonetic Classification

    Microsoft Academic Search

    Alexander Gutkin; Simon King

    2004-01-01

    This paper explores the issues involved in using sym- bolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This repre- sentation is based on a set of phonological distinctive fea- tures which is a linguistically well-motivated alternative to the \\

  11. TOWARDS AUTOMATIC SPEECH RECOGNITION IN ADVERSE ENVIRONMENTS

    Microsoft Academic Search

    D. Dimitriadis; N. Katsamanis; P. Maragos; G. Papandreou; V. Pitsikalis

    Some of our research efforts towards building Automatic Speech Recognition (ASR) systems designed to work in real-world conditions are presented. The methods we pro- pose exhibit improved performance in noisy environments and offer robustness against speaker variability. Advanced nonlinear signal processing techniques, modulation- and chaotic-based, are utilized for auditory feature extraction. The auditory features are complemented with visual speech cues

  12. SPEECH LEVELS IN VARIOUS NOISE ENVIRONMENTS

    EPA Science Inventory

    The goal of this study was to determine average speech levels used by people when conversing in different levels of background noise. The non-laboratory environments where speech was recorded were: high school classrooms, homes, hospitals, department stores, trains and commercial...

  13. Pitch-Learning Algorithm For Speech Encoders

    NASA Technical Reports Server (NTRS)

    Bhaskar, B. R. Udaya

    1988-01-01

    Adaptive algorithm detects and corrects errors in sequence of estimates of pitch period of speech. Algorithm operates in conjunction with techniques used to estimate pitch period. Used in such parametric and hybrid speech coders as linear predictive coders and adaptive predictive coders.

  14. Scaffolded-Language Intervention: Speech Production Outcomes

    ERIC Educational Resources Information Center

    Bellon-Harn, Monica L.; Credeur-Pampolina, Maggie E.; LeBoeuf, Lexie

    2013-01-01

    This study investigated the effects of a scaffolded-language intervention using cloze procedures, semantically contingent expansions, contrastive word pairs, and direct models on speech abilities in two preschoolers with speech and language impairment speaking African American English. Effects of the lexical and phonological characteristics (i.e.,…

  15. The Need for a Speech Corpus

    ERIC Educational Resources Information Center

    Campbell, Dermot F.; McDonnell, Ciaran; Meinardi, Marti; Richardson, Bunny

    2007-01-01

    This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL/ESL students. In the first part, sections 1-4, the need for improvements in the teaching of listening skills and pronunciation practice for EFL/ESL students is noted. It is argued that the use of authentic native-to-native speech is…

  16. Speech recognition by machine: A review

    Microsoft Academic Search

    D. R. Reddy

    1976-01-01

    This paper provides a review of recent developments in speech recognition research. The concept of sources of knowledge is introduced and the use of knowledge to generate and verify hypotheses is discussed. The difficulties that arise in the construction of different types of speech recognition systems are discussed and the structure and performance of several such systems is presented. Aspects

  17. Preschoolers Benefit from Visually Salient Speech Cues

    ERIC Educational Resources Information Center

    Lalonde, Kaylah; Holt, Rachael Frush

    2015-01-01

    Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…

  18. Pronunciation Modeling for Large Vocabulary Speech Recognition

    ERIC Educational Resources Information Center

    Kantor, Arthur

    2010-01-01

    The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the…

  19. Speech Recognition Thresholds for Multilingual Populations.

    ERIC Educational Resources Information Center

    Ramkissoon, Ishara

    2001-01-01

    This article traces the development of speech audiometry in the United States and reports on the current status, focusing on the needs of a multilingual population in terms of measuring speech recognition threshold (SRT). It also discusses sociolinguistic considerations, alternative SRT stimuli for second language learners, and research on using…

  20. How do humans process and recognize speech?

    Microsoft Academic Search

    Jont B. Allen

    1994-01-01

    Until the performance of automatic speech recognition (ASR) hardware surpasses human performance in accuracy and robustness, we stand to gain by understanding the basic principles behind human speech recognition (HSR). This problem was studied exhaustively at Bell Labs between the years of 1918 and 1950 by Harvey Fletcher and his colleagues. The motivation for these studies was to quantify the

  1. Speech Recognition with Primarily Temporal Cues

    Microsoft Academic Search

    Robert V. Shannon; Fan-Gang Zeng; Vivek Kamath; John Wygonski; Michael Ekelid

    1995-01-01

    Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants,

  2. Speech recognition in noisy environments: A survey

    Microsoft Academic Search

    Yifan Gong

    1995-01-01

    The performance levels of most current speech recognizers degrade significantly when environmental noise occurs during use. Such performance degradation is mainly caused by mismatches in training and operating environments. During recent years much effort has been directed to reducing this mismatch. This paper surveys research results in the area of digital techniques for single microphone noisy speech recognition classified in

  3. How Should a Speech Recognizer Work?

    ERIC Educational Resources Information Center

    Scharenborg, Odette; Norris, Dennis; ten Bosch, Louis; McQueen, James M.

    2005-01-01

    Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that…

  4. Effects of Syllable Frequency in Speech Production

    ERIC Educational Resources Information Center

    Cholin, Joana; Levelt, Willem J. M.; Schiller, Niels O.

    2006-01-01

    In the speech production model proposed by [Levelt, W. J. M., Roelofs, A., Meyer, A. S. (1999). A theory of lexical access in speech production. "Behavioral and Brain Sciences," 22, pp. 1-75.], syllables play a crucial role at the interface of phonological and phonetic encoding. At this interface, abstract phonological syllables are translated…

  5. Communicative Competence, Speech Acts and Discourse Analysis.

    ERIC Educational Resources Information Center

    McCoy, Terry; And Others

    Three papers intended as preliminary studies to bilingual professional curriculum development are included. "Speech Acts and Discourse Analysis," by Terry McCoy, represents an introduction to discourse analysis as a tool for the language teacher. The notion of a typology of speech acts is set forth, and models of discourse analysis by…

  6. Speech masking and cancelling and voice obscuration

    SciTech Connect

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  7. RASTA-PLP speech analysis technique

    Microsoft Academic Search

    Hynek Hermansky; Nelson Morgan; Aruna Bayya; Phil Kohn

    1992-01-01

    Most speech parameter estimation techniques are easily influenced by the frequency response of the communication channel. The authors have developed a technique that is more robust to such steady-state spectral factors in speech. The approach is conceptually simple and computationally efficient. The new method is described, and experimental results are proposed that show significant advantages for the proposed method

  8. General-Purpose Monitoring during Speech Production

    ERIC Educational Resources Information Center

    Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris

    2011-01-01

    The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…

  9. Electrocardiographic anxiety profiles improve speech anxiety.

    PubMed

    Kim, Pyoung Won; Kim, Seung Ae; Jung, Keun-Hwa

    2012-12-01

    The present study was to set out in efforts to determine the effect of electrocardiographic (ECG) feedback on the performance in speech anxiety. Forty-six high school students participated in a speech performance educational program. They were randomly divided into two groups, an experimental group with ECG feedback (N = 21) and a control group (N = 25). Feedback was given with video recording in the control, whereas in the experimental group, an additional ECG feedback was provided. Speech performance was evaluated by the Korean Broadcasting System (KBS) speech ability test, which determines the 10 different speaking categories. ECG was recorded during rest and speech, together with a video recording of the speech performance. Changes in R-R intervals were used to reflect anxiety profiles. Three trials were performed for 3-week program. Results showed that the subjects with ECG feedback revealed a significant improvement in speech performance and anxiety states, which compared to those in the control group. These findings suggest that visualization of the anxiety profile feedback with ECG can be a better cognitive therapeutic strategy in speech anxiety. PMID:22714138

  10. The Lombard Effect on Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Zeine, Lina; Brandt, John F.

    1988-01-01

    The study investigated the Lombard effect (evoking increased speech intensity by applying masking noise to ears of talker) on the speech of esophageal talkers, artificial larynx users, and normal speakers. The noise condition produced the highest intensity increase in the esophageal speakers. (Author/DB)

  11. Pulmonic Ingressive Speech in Shetland English

    ERIC Educational Resources Information Center

    Sundkvist, Peter

    2012-01-01

    This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…

  12. The motor theory of speech perception revised

    Microsoft Academic Search

    ALVIN M. LIBERMAN; IGNATIUS G. MATTINGLY

    1985-01-01

    Abstract A motor theory of speech perception, initially proposed to account for results of early experiments with synthetic speech, is now extensively revised to accom- modate recent findings, and to relate the assumptions of the theory to those that might be made,about other perceptual modes. According to the revised theory, phonetic information is perceived in a biologically distinct system, a

  13. Speech recognition use in healthcare applications

    Microsoft Academic Search

    Scott Durling; Jo Lumsden

    2008-01-01

    Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of

  14. Enhancement and bandwidth compression of noisy speech

    Microsoft Academic Search

    J. S. Lim; A. V. Oppenheim

    1979-01-01

    Over the past several years there has been considerable attention focused on the problem of enhancement and bandwidth compression of speech degraded by additive background noise. This interest is motivated by several factors including a broad set of important applications, the apparent lack of robustness in current speech-compression systems and the development of several potentially promising and practical solutions. One

  15. Speech-Language Program Review External Report.

    ERIC Educational Resources Information Center

    Nussbaum, Jo

    A study evaluated the Speech-Language Program in school district #68, Nanaimo, British Columbia, Canada. An external evaluator visited the district and spent 4 consecutive days in observing speech-language pathologists (SLPs), interviewing teachers, parents, administrators, and examining records. Results indicated an extremely positive response to…

  16. The motor theory of speech perception revised

    Microsoft Academic Search

    ALVIN M. LIBERMAN; IGNATIUS G. MATTINGLY

    1986-01-01

    A motor theory of speech perception, initially proposed to account for results of early experiments with synthetic speech, is now extensively revised to accom- modate recent findings, and to relate the assumptions of the theory to those that might be made about other perceptual modes. According to the revised theory, phonetic information is perceived in a biologically distinct system, a

  17. Hypnosis and the Reduction of Speech Anxiety.

    ERIC Educational Resources Information Center

    Barker, Larry L.; And Others

    The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…

  18. Learning the Hidden Structure of Speech.

    ERIC Educational Resources Information Center

    Elman, Jeffery Locke; Zipser, David

    The back-propagation neural network learning procedure was applied to the analysis and recognition of speech. Because this learning procedure requires only examples of input-output pairs, it is not necessary to provide it with any initial description of speech features. Rather, the network develops on its own set of representational features…

  19. Biosignal Processing Applications for Speech Processing

    Microsoft Academic Search

    Stefan Pantazi

    Speech is a biosignal that is amenable to general biosignal processing methodologies such as frequency domain processing. This is supported today by the availability of inexpensive digital multimedia hardware and by the developments of the theoretical aspects of signal processing. However, sound processing must be also regarded through the prism of the psychoacoustic reality of the human hearing system. Speech

  20. Speech recognition by machines and humans

    Microsoft Academic Search

    Richard P. Lippmann

    1997-01-01

    This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are

  1. Modulation Features for Speech and Music Classification

    Microsoft Academic Search

    Omer Mohsin Mubarak; Eliathamby Ambikairajah; Julien Epps; Teddy Surya Gunawan

    2006-01-01

    Many attempts to accurately classify speech and music have been investigated over the years. This paper presents modulation features for effective speech and music classification. A Gammatone filter bank is used as a front-end for this classification system, where amplitude modulation (AM) and frequency modulation (FM) features are extracted from the critical band outputs of the Gammatone filters. In addition,

  2. Humanistic Speech Education to Create Leadership Models.

    ERIC Educational Resources Information Center

    Oka, Beverley Jeanne

    A theoretical framework based primarily on the humanistic psychology of Abraham Maslow is used in developing a humanistic approach to speech education. The holistic view of human learning and behavior, inherent in this approach, is seen to be compatible with a model of effective leadership. Specific applications of this approach to speech

  3. Freedom of Speech After Justice Brennan

    Microsoft Academic Search

    Marc Rohr

    2010-01-01

    This article will explore the positions taken in a number of the most important areas of the law of freedom of speech by each of the Justices presently on the Court, and will attempt to suggest the extent to which protection of speech has been, or likely will be, diminished in the post-Brennan era. Obviously, not every aspect of the

  4. Repeated Speech Errors: Evidence for Learning

    ERIC Educational Resources Information Center

    Humphreys, Karin R.; Menzies, Heather; Lake, Johanna K.

    2010-01-01

    Three experiments elicited phonological speech errors using the SLIP procedure to investigate whether there is a tendency for speech errors on specific words to reoccur, and whether this effect can be attributed to implicit learning of an incorrect mapping from lemma to phonology for that word. In Experiment 1, when speakers made a phonological…

  5. Speech segmentation is facilitated by visual cues

    Microsoft Academic Search

    Toni Cunillera; Estela Càmara; Matti Laine; Antoni Rodríguez-Fornells

    2010-01-01

    Evidence from infant studies indicates that language learning can be facilitated by multimodal cues. We extended this observation to adult language learning by studying the effects of simultaneous visual cues (nonassociated object images) on speech segmentation performance. Our results indicate that segmentation of new words from a continuous speech stream is facilitated by simultaneous visual input that it is presented

  6. Anatomy and Physiology of the Speech Mechanism.

    ERIC Educational Resources Information Center

    Sheets, Boyd V.

    This monograph on the anatomical and physiological aspects of the speech mechanism stresses the importance of a general understanding of the process of verbal communication. Contents include "Positions of the Body,""Basic Concepts Linked with the Speech Mechanism,""The Nervous System,""The Respiratory System--Sound-Power Source,""The…

  7. Emotion recognition from Mandarin speech signals

    Microsoft Academic Search

    Tsang-Long Pao; Yu-Te Chen; Jun-Heng Yeh

    2004-01-01

    In this paper, a Mandarin speech based emotion classification method is presented. Five primary human emotions including anger, boredom, happiness, neutral and sadness are investigated. In emotion classification of speech signals, the conventional features are statistics of fundamental frequency, loudness, duration and voice quality. However, the recognition accuracy of systems employing these features degrades substantially when more than two valence

  8. The Effects of TV on Speech Education

    ERIC Educational Resources Information Center

    Gocen, Gokcen; Okur, Alpaslan

    2013-01-01

    Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in a…

  9. Bayesian learning of speech duration models

    Microsoft Academic Search

    Jen-tzung Chien; Chih-hsien Huang

    2003-01-01

    This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration

  10. Improving robustness of speech recognition systems

    NASA Astrophysics Data System (ADS)

    Mitra, Vikramjit

    2010-11-01

    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system.

  11. Integration of speech with natural language understanding.

    PubMed Central

    Moore, R C

    1995-01-01

    The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding. PMID:7479813

  12. Voice quality modelling for expressive speech synthesis.

    PubMed

    Monzo, Carlos; Iriondo, Ignasi; Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  13. Emerging issues in speech therapy in Iran.

    PubMed

    Nilipour, Reza

    2002-01-01

    This report is a short review of the form and content of speech and language therapy services and the trend of their institutionalization in Iran. A summary of formal education in speech and language therapy in Iran as originated by establishing a 4-year BS rehabilitation program in the College of Rehabilitation Sciences in Tehran is given. Since then, speech and language rehabilitation programs have been expanding both in size and quality, resulting in about 1,000 speech therapists practicing in hospitals and rehabilitation centers throughout the country. The expansion of graduate programs at MS level in three different institutions and a prospective PhD program are also adding to the quality of these services. The content of the theory courses and clinical practice courses as well as research on specific speech and language disorders and cross-linguistic studies are briefly described. PMID:12037418

  14. The Functional Connectome of Speech Control

    PubMed Central

    Fuertinger, Stefan; Horwitz, Barry; Simonyan, Kristina

    2015-01-01

    In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI) data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs) in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively forged the formation of the functional speech connectome. In addition, the observed capacity of the primary sensorimotor cortex to exhibit operational heterogeneity challenged the established concept of unimodality of this region. PMID:26204475

  15. Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications

    ERIC Educational Resources Information Center

    Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara

    2009-01-01

    Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech

  16. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  17. Speech Characteristics Associated with Three Genotypes of Ataxia

    ERIC Educational Resources Information Center

    Sidtis, John J.; Ahn, Ji Sook; Gomez, Christopher; Sidtis, Diana

    2011-01-01

    Purpose: Advances in neurobiology are providing new opportunities to investigate the neurological systems underlying motor speech control. This study explores the perceptual characteristics of the speech of three genotypes of spino-cerebellar ataxia (SCA) as manifest in four different speech tasks. Methods: Speech samples from 26 speakers with SCA…

  18. TOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE

    Microsoft Academic Search

    Ronald A. Cole

    The Center for Spoken Language Understanding (CSLU) provides free language resources to researchers and educators in all areas of speech and hearing science. These resources are of great potential value to speech scientists for analyzing speech, for diagnosing and treating speech and language problems, for researching and evaluating language technologies, and for training students in the theory and practice of

  19. Bachelor of Science in Speech-Language Pathology and Audiology

    E-print Network

    O'Toole, Alice J.

    Bachelor of Science in Speech- Language Pathology and Audiology Speech-language pathologists to persons with hearing loss and problems with balance. The speech-language pathology and audiology program-Language Pathology and Audiology Speech-language pathology and audiology are professions consistently rated among

  20. Neural Correlates of Bimodal Speech and Gesture Comprehension

    ERIC Educational Resources Information Center

    Kelly, Spencer D.; Kravitz, Corinne; Hopkins, Michael

    2004-01-01

    The present study examined the neural correlates of speech and hand gesture comprehension in a naturalistic context. Fifteen participants watched audiovisual segments of speech and gesture while event-related potentials (ERPs) were recorded to the speech. Gesture influenced the ERPs to the speech. Specifically, there was a right-lateralized N400…

  1. Visual and Auditory Input in Second-Language Speech Processing

    ERIC Educational Resources Information Center

    Hardison, Debra M.

    2010-01-01

    The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…

  2. Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage

    ERIC Educational Resources Information Center

    Peach, Richard K.; Tonkovich, John D.

    2004-01-01

    Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…

  3. Predictive Coding of Speech at Low Bit Rates

    Microsoft Academic Search

    B. Atal

    1982-01-01

    Predictive coding is a promising approach for speech coding. In this paper, we review the recent work on adaptive predictive coding of speech signals, with particular emphasis on achieving high speech quality at low bit rates (less than 10 kbits\\/s). Efficient prediction of the redundant structure in speech signals is obviously important for proper functioning of a predictive coder. It

  4. Statistical and model based approach to unvoiced speech detection

    Microsoft Academic Search

    Krithika Giridharan; Brett Y. Smolenski; Robert E. Yantorno

    2004-01-01

    The detection of unvoiced speech in the presence of additive background noise is complicated by the fact that unvoiced speech is very similar to white noise. The mechanism of production of unvoiced speech is known to be due to turbulent airflow in the constrictions of the vocal tract. Three approaches for detecting unvoiced speech from additive background noise have been

  5. Recognizing Sloppy Speech CMU-LTI-05-190

    E-print Network

    Eskenazi, Maxine

    of pronunciation modeling, and introduce flexible tying to better model reductions in sloppy speech. We findRecognizing Sloppy Speech Hua Yu CMU-LTI-05-190 Language Technology Institute School of Computer Hua Yu #12;#12;Abstract As speech recognition moves from labs into the real world, the sloppy speech

  6. Deep Learning in Speech Synthesis August 31st, 2013

    E-print Network

    Tomkins, Andrew

    Deep Learning in Speech Synthesis Heiga Zen Google August 31st, 2013 #12;Outline Background Deep Learning Deep Learning in Speech Synthesis Motivation Deep learning-based approaches DNN-based statistical-to-speech synthesis (TTS) Text (discrete symbol sequence) Speech (continuous time series) Heiga Zen Deep Learning

  7. The ambassador's speech: A particularly Hellenistic genre of oratory

    Microsoft Academic Search

    Cecil W. Wooten

    1973-01-01

    The ambassador's speech assumed great importance during the Hellenistic period and became a distinct genre of deliberative oratory. Although there are no genuine ambassador's speeches extant, one can construct a model speech of this type by comparing ambassador's speeches in the Greek historians, especially Polybius.

  8. PAPER Speech Enhancement: New Approaches to Soft Decision

    Microsoft Academic Search

    Joon-Hyuk CHANG; Nam Soo KIM

    SUMMARY In this paper, we propose new approaches to speech enhancement based on soft decision. In order to enhance the statistical reliability in estimating speech activity, we intro- duce the concept of a global speech absence probability (GSAP). First, we compute the conventional speech absence probabil- ity (SAP) and then modify it according to the newly proposed GSAP. The modification

  9. CASS: A PHONETICALLY TRANSCRIBED CORPUS OF MANDARIN SPONTANEOUS SPEECH 1

    E-print Network

    Byrne, William

    CASS: A PHONETICALLY TRANSCRIBED CORPUS OF MANDARIN SPONTANEOUS SPEECH 1 LI Aijun (1), ZHENG Fang speech and language effects. The Chinese Annotated Spontaneous Speech (CASS) corpus contains phonetically. 2. CORPUS INFORMATION The speech in the CASS corpus was provided by the Broadcast Station

  10. CONTROL OF VOICE QUALITY FOR EMOTIONAL SPEECH SYNTHESIS

    Microsoft Academic Search

    Carlo Drioli; Fabio Tesser; Graziano Tisato; Piero Cosi; Enrico Marchetto

    SUMMARY Speech production in general, and emotional speech in particular, is characterized by a wide variety of phonation modalities. Voice quality, which is the term commonly used in the field, has an important role in the communication of emotions through speech, and nonmodal phonation modalities (soft, breathy, whispery, creaky, for example) are commonly found in emotional speech corpora. In this

  11. Review: The speech corpus and database of Japanese dialects

    Microsoft Academic Search

    Yasuko Nagano-madsen

    is a recording of readings of words, phrases, sentences, and texts in Japanese dialects. The focus of the speech material is on prosody, in particular, on accentual variations, and to a lesser extent on intonation. In addition to the dialectal materials, SCDJD contains speech of the minority language Ainu, Japanese traditional singings, school children's speech, and speech by the foreign

  12. Using semantic analysis to improve speech recognition performance

    E-print Network

    Erdogan, Hakan

    Using semantic analysis to improve speech recognition performance Hakan Erdogan a,*,1 , Ruhi modeling for speech recognition attempts to model the probability P(W) of observ- ing a word sequence W natural language for speech recognition. The purpose of language modeling is to bias a speech recognizer

  13. Computational Differences between Whispered and Non-Whispered Speech

    ERIC Educational Resources Information Center

    Lim, Boon Pang

    2011-01-01

    Whispering is a common type of speech which is not often studied in speech technology. Perceptual and physiological studies show us that whispered speech is subtly different from phonated speech, and is surprisingly able to carry a tremendous amount of information. In this dissertation we consider the question: What makes whispering a good form of…

  14. Integrating Stress Information in Large Vocabulary Continuous Speech Recognition

    E-print Network

    Paris-Sud XI, Université de

    Integrating Stress Information in Large Vocabulary Continuous Speech Recognition Bogdan Ludusan, by performing well even for foreign-accented speech. Index Terms: speech recognition, stress, rhythm 1 investigated, stress seems suitable for speech recognition tasks. This is due to its intrinsic characteristics

  15. HOW EFFECTIVE IS UNSUPERVISED DATA COLLECTION FOR CHILDREN'S SPEECH RECOGNITION?

    E-print Network

    Mostow, Jack

    HOW EFFECTIVE IS UNSUPERVISED DATA COLLECTION FOR CHILDREN'S SPEECH RECOGNITION? G. Aist*, P. Chan a unique challenge to automatic speech recognition. Today's state-of-the-art speech recognition systems transcribed by a speech recognition system and automatically filtered. We studied how to use

  16. Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks

    E-print Network

    Noble, William Stafford

    Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks Karen Livescu, James features, such as articulatory or other phonological features, for auto- matic speech recognition The majority of current speech recognition research assumes a model of speech consisting of a stream

  17. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  18. A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION

    E-print Network

    A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION Jingdong Chen U*, Kuldip K-mail: {jingdong.chen, nakamura}@slt.atr.co.jp, k.paliwal@me.gu.edu.au ABSTRACT Noise robust speech recognition has in automatic speech recognition. This has led to sub-band based speech recognition in which the full

  19. The SPHINX-II speech recognition system: an overview

    Microsoft Academic Search

    Xuedong Huang; Fileno Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld

    1993-01-01

    In order for speech recognizers to deal with increased task p erplexity, speaker variation, and environment variation, improved speech recognition is critical. Stead y progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.

  20. An improved automatic lipreading system to enhance speech recognition

    Microsoft Academic Search

    Eric Petajan; Bradford Bischoff; David Bodoff; N. M. Brooke

    1988-01-01

    Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing

  1. AUDIO SOURCE SEPARATION WITH ONE SENSOR FOR ROBUST SPEECH RECOGNITION

    E-print Network

    Paris-Sud XI, Université de

    AUDIO SOURCE SEPARATION WITH ONE SENSOR FOR ROBUST SPEECH RECOGNITION L. Benaroya, F. Bimbot, G of noise compensa- tion in speech signals for robust speech recognition. Sev- eral classical denoising- perimposed to the voice of the speaker(s). While automatic speech recognition is a rather mature technology

  2. Check List of Books and Equipment in Speech.

    ERIC Educational Resources Information Center

    Speech Communication Association, Annandale, VA.

    This list of books, equipment, and supplies in speech offers several hundred resources selected by individual advertisers. The resources are divided into such categories as fundamentals of speech; public address; communication; radio, television, and film; theatre; speech and hearing disorders; speech education; dictionaries and other references;…

  3. Tracking Change in Children with Severe and Persisting Speech Difficulties

    ERIC Educational Resources Information Center

    Newbold, Elisabeth Joy; Stackhouse, Joy; Wells, Bill

    2013-01-01

    Standardised tests of whole-word accuracy are popular in the speech pathology and developmental psychology literature as measures of children's speech performance. However, they may not be sensitive enough to measure changes in speech output in children with severe and persisting speech difficulties (SPSD). To identify the best ways of doing this,…

  4. Listening to talking faces: motor cortical activation during speech perception

    E-print Network

    Coulson, Seana

    Listening to talking faces: motor cortical activation during speech perception Jeremy I. Skipper that audiovisual speech perception activated a network of brain regions that included cortical motor areas involved into the speech perception process involves a network of multimodal brain regions associated with speech

  5. SPEECH ENHANCEMENT USING MULTI--PULSE EXCITED LINEAR PREDICTION SYSTEM

    E-print Network

    SPEECH ENHANCEMENT USING MULTI--PULSE EXCITED LINEAR PREDICTION SYSTEM K.K. PALIWAL Computer enhancement. It is shown that for successful enhancement of speech the error--weighting filter should of enhancing speech corrupted by additive white noise, when only noisy speech is available, is of considerable

  6. Speech Noise Estimation using Enhanced Minima Controlled Recursive Averaging

    Microsoft Academic Search

    Ningping Fan; Justinian Rosca; Radu Balan

    2007-01-01

    Accurate noise power spectrum estimation in a noisy speech signal is a key challenge problem in speech enhancement. One state-of-the-art approach is the minima controlled recursive averaging (MCRA). This paper presents an enhanced MCRA algorithm (EMCRA), which demonstrates less speech signal leakage and faster response time to follow abrupt changes in the noise power spectrum. Experiments using real speech and

  7. On the use of dynamic spectral parameters in speech recognition

    Microsoft Academic Search

    S. M. Ahadi; H. Sheikhzadeh; R. L. Brennan; G. H. Freeman; E. Chau

    2003-01-01

    Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction algorithm for speech recognition. However, the usual cepstral dynamics do not directly reflect the dynamics of the speech spectrum, as they are extracted

  8. Speech summarization using weighted finite-state transducers

    Microsoft Academic Search

    Takaaki Hori; Chiori Hori; Yasuhiro Minami

    2003-01-01

    This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spon- taneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes re- dundant information, and its style is different to that of written

  9. Incorporating Women's Speeches as Models in the Basic Course.

    ERIC Educational Resources Information Center

    Jensen, Marvin D.

    Studies indicate that there is a general lack of availability and use of women's speeches in college speech curricula. By incorporating more women's speeches as models, instructors of the basic course in speech can present a more complete picture of American public speaking while also encouraging women in these classes to feel less muted in their…

  10. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    ERIC Educational Resources Information Center

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  11. Intonation contour in synchronous speech

    NASA Astrophysics Data System (ADS)

    Wang, Bei; Cummins, Fred

    2003-10-01

    Synchronous Speech (Syn-S), obtained by having pairs of speakers read a prepared text together, has been shown to result in interesting properties in the temporal domain, especially in the reduction of inter-speaker variability in supersegmental timing [F. Cummins, ARLO 3, 7-11 (2002)]. Here we investigate the effect of synchronization among speakers on the intonation contour, with a view to informing models of intonation. Six pairs of speakers (all females) read a short text (176 words) both synchronously and solo. Results show that (1) the pitch accent height above a declining baseline is reduced in Syn-S, compared with solo speech, while the pitch accent location is consistent across speakers in both conditions; (2) in contrast to previous findings on duration matching, there is an asymmetry between speakers, with one speaker exerting a stronger influence on the observed intonation contour than the other; (3) agreement on the boundaries of intonational phrases is greater in Syn-S and intonation contours are well matched from the first syllable of the phrase and throughout.

  12. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were moderately-highly correlated, even the most highly-related segmental and lexical measures agreed on only 76% of classifications (i.e., to CAS and PD). Finally, VOT analyses revealed that CAS utilized a distinct distribution pattern relative to PD and TD. Discussion frames the current findings within a profile of CAS and provides a validated list of criteria for the differential diagnosis of CAS and PD.

  13. Speech processing using conditional observable maximum likelihood continuity mapping

    DOEpatents

    Hogden, John; Nix, David

    2004-01-13

    A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence of speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.

  14. COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION

    E-print Network

    Garner, Philip N.

    COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY normalization and cochlear implant-like speech processing for microphone array- based speech recognition Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which

  15. Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech Recognition with Prior

    E-print Network

    Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech.paliwal@griffith.edu.au nakamura@slt.atr.co.jp Abstract We have proposed earlier a noise adaptive speech recognition ap- proach that this method performs better than the previous methods. 1. Introduction Speech recognition has to be carried

  16. State-based labelling for a sparse representation of speech and its application to robust speech recognition

    E-print Network

    Virtanen, Tuomas

    this labelling in noise- robust automatic speech recognition. Acoustic time-frequency segments of speech the transcriptions. In the recognition phase, noisy speech is mod- eled by a sparse linear combination of noise was tested in the connected digit recognition task with noisy speech material from the Aurora-2 database

  17. Effects of reverberation on speech segregation

    NASA Astrophysics Data System (ADS)

    Culling, John F.; Toh, Chaz Yee; Hodder, Kathryn I.

    2002-05-01

    Perceptual separation of speech from interfering noise using binaural cues and fundamental frequency (F0) differences is disrupted by reverberation [Plomp, Acoustica 34, 200-211 Culling et al., Speech Commun. 14, 71-96]. Culling et al. found that the effect of F0 differences on vowel identification was robust in reverberation unless combined with even subtle F0 modulation. In the current study, speech reception thresholds (SRTs) were measured against a single competing voice. Both voices were either monotonized or normally intonated. Each came from recordings of the same voice, but interfering sentences were feminized (F0 increased 80% vocal-tract length reduced 20%). The voices were presented from either the same or from different locations within anechoic and reverberant virtual rooms of Culling et al. In anechoic conditions, SRTs were lower when the voices were spatially separated and/or intonated, indicating that intonated speech is more intelligible than monotonous speech. In reverberant conditions (T60=400 ms), SRTs were higher, with no differences between the conditions. A follow-up experiment introduced sentences with inverted F0 contours. While acceptable in quiet, these sentences gave higher SRTs in all conditions. It appears that reverberant conditions leave intonated speech intelligible, but make it unseparable, while monotonous speech remains separable but is unintelligible.

  18. Short-time Fourier analysis of sampled speech

    Microsoft Academic Search

    M. Portnoff

    1981-01-01

    The theoretical basis for the representation of a speech signal by its short-time Fourier transform is developed. A time-frequency representation for linear time-varying systems is applied to the speech-production model to formulate a quasi-stationary representation for the speech waveform. Short-time Fourier analysis of the resulting representation yields the relationship between the short-time Fourier transform of the speech and the speech-production

  19. The sensorimotor and social sides of the architecture of speech.

    PubMed

    Pezzulo, Giovanni; Barca, Laura; D'Ausilio, Alessando

    2014-12-01

    Speech is a complex skill to master. In addition to sophisticated phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps that couple heard speech sounds with motor programs for speech production; imitation and self-imitation mechanisms that can train the sensorimotor maps to reproduce heard speech sounds; and a "pedagogical" learning environment that supports tutor learning. PMID:25514959

  20. Applying wavelet analysis to speech segmentation and classification

    NASA Astrophysics Data System (ADS)

    Tan, Beng T.; Lang, Robert; Schroder, Heiko; Spray, Andrew; Dermody, Phillip

    1994-03-01

    We propose the design of a hearing aid based on the wavelet transform. The fast wavelet transform is used to decompose speech into different frequency components. This paper presents the difficulties in the use of wavelet transforms for speech processing and shows how the careful selection of wavelet coefficients can enable the four major categories of speech - voiced speech, plosives, fricatives, and silence - to be identified. With knowledge of these four categories, it is shown how speech can be easily and effectively segmented.

  1. Brain-Computer Interfaces for Speech Communication

    PubMed Central

    Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

    2010-01-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

  2. Bimodal codebooks for CELP speech coding

    E-print Network

    Woo, Hong Chae

    1988-01-01

    then be expressed by the simple difference equation s(n) = Q crss(n ? k) + Q u(a) (2) where u(u) is the excitation input of LPC synthesis filter. and p is the predictor order. If the speech signal obeys the model of (2) exactlv then the predictor error signal... of the codebook. Since the excitation of unvoiced speech is considered as a, random signal, s, quarter of the codebook is generated from a white Gaussian sequence. Implementing the excitation of high quality MLPC in a. type of codebook, good synthetic speech...

  3. Modern methods of investigation in speech production.

    PubMed

    Fujimura, O

    1980-01-01

    Methodologies of speech research with respect to the production processes are discussed, with an emphasis on the recent development of new instrumental techniques. It is argued that systematic studies of large amounts of speech data are necessary to understand the basic characteristics of speech. The traditional notion of phoneme-size segments seems inappropriate for interpreting multidimensional articulatory movements by a concatenative model. Experimental means such as a computer-controlled X-ray microbeam technique and advanced statistical processing, in combination with a new theoritical framework of phonetic description, promise future development. PMID:7413768

  4. Vector Adaptive/Predictive Encoding Of Speech

    NASA Technical Reports Server (NTRS)

    Chen, Juin-Hwey; Gersho, Allen

    1989-01-01

    Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.

  5. Acoustic Speech Analysis Of Wayang Golek Puppeteer

    NASA Astrophysics Data System (ADS)

    Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

    2010-12-01

    Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

  6. Children use visual speech to compensate for non-intact auditory speech.

    PubMed

    Jerger, Susan; Damian, Markus F; Tye-Murray, Nancy; Abdi, Hervé

    2014-10-01

    We investigated whether visual speech fills in non-intact auditory speech (excised consonant onsets) in typically developing children from 4 to 14 years of age. Stimuli with the excised auditory onsets were presented in the audiovisual (AV) and auditory-only (AO) modes. A visual speech fill-in effect occurs when listeners experience hearing the same non-intact auditory stimulus (e.g., /-b/ag) as different depending on the presence/absence of visual speech such as hearing /bag/ in the AV mode but hearing /ag/ in the AO mode. We quantified the visual speech fill-in effect by the difference in the number of correct consonant onset responses between the modes. We found that easy visual speech cues /b/ provided greater filling in than difficult cues /g/. Only older children benefited from difficult visual speech cues, whereas all children benefited from easy visual speech cues, although 4- and 5-year-olds did not benefit as much as older children. To explore task demands, we compared results on our new task with those on the McGurk task. The influence of visual speech was uniquely associated with age and vocabulary abilities for the visual speech fill--in effect but was uniquely associated with speechreading skills for the McGurk effect. This dissociation implies that visual speech--as processed by children-is a complicated and multifaceted phenomenon underpinned by heterogeneous abilities. These results emphasize that children perceive a speaker's utterance rather than the auditory stimulus per se. In children, as in adults, there is more to speech perception than meets the ear. PMID:24974346

  7. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  8. Chinese speech intelligibility at different speech sound pressure levels and signal-to-noise ratios in simulated classrooms

    Microsoft Academic Search

    Peng Jianxin

    2010-01-01

    The speech intelligibility in classroom can be influenced by background-noise levels, speech sound pressure level (SSPL), reverberation time and signal-to-noise ratio (SNR). The relationship between SSPL and subjective Chinese Mandarin speech intelligibility and the effect of different SNRs on Chinese Mandarin speech intelligibility in the simulated classroom were investigated through room acoustical simulation, auralisation technique and subjective evaluation. Chinese speech

  9. An introduction to the speechBITE database: Speech pathology database for best interventions and treatment efficacy

    Microsoft Academic Search

    Katherine Smith; Patricia McCabe; Leanne Togher; Emma Power; Natalie Munro; Elizabeth Murray; Michelle Lincoln

    2010-01-01

    This paper describes the development of the Speech Pathology Database for Best Interventions and Treatment Efficacy (speechBITE) at The University of Sydney. The speechBITE database is designed to provide better access to the intervention research relevant to speech pathology and to help clinicians interpret treatment research. The challenges speech pathologists face when locating research to support evidence-based practice have been

  10. Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production.

    PubMed

    Sato, Marc; Grabski, Krystyna; Garnier, Maëva; Granjon, Lionel; Schwartz, Jean-Luc; Nguyen, Noël

    2013-01-01

    Auditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies) during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment. PMID:23874316

  11. How visual timing and form information affect speech and non-speech processing.

    PubMed

    Kim, Jeesun; Davis, Chris

    2014-10-01

    Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming. PMID:25190328

  12. Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production

    PubMed Central

    Sato, Marc; Grabski, Krystyna; Garnier, Maëva; Granjon, Lionel; Schwartz, Jean-Luc; Nguyen, Noël

    2013-01-01

    Auditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies) during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment. PMID:23874316

  13. Speech information retrieval: a review

    SciTech Connect

    Hafen, Ryan P.; Henry, Michael J.

    2012-11-01

    Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

  14. CHATR: A generic speech synthesis system 

    E-print Network

    Black, Alan W; Taylor, Paul A

    1994-01-01

    This paper describes a generic speech synthesis system called CHATR which is being developed at ATR. CHATR is designed in a modular way so that module parameters and even which modules are actually used may be set and ...

  15. Differential oscillatory encoding of foreign speech.

    PubMed

    Pérez, Alejandro; Carreiras, Manuel; Gillon Dowens, Margaret; Duñabeitia, Jon Andoni

    2015-08-01

    Neuronal oscillations play a key role in auditory perception of verbal input, with the oscillatory rhythms of the brain showing synchronization with specific frequencies of speech. Here we investigated the neural oscillatory patterns associated with perceiving native, foreign, and unknown speech. Spectral power and phase synchronization were compared to those of a silent context. Power synchronization to native speech was found in frequency ranges corresponding to the theta band, while no synchronization patterns were found for the foreign speech context and the unknown language context. For phase synchrony, the native and unknown languages showed higher synchronization in the theta-band than the foreign language when compared to the silent condition. These results suggest that neural synchronization patterns are markedly different for native and foreign languages. PMID:26070104

  16. Speech Recognition Via Phonetically Featured Syllables 

    E-print Network

    King, Simon; Stephenson, Todd; Isard, Stephen; Taylor, Paul; Strachan, Alex

    in speech recognition. We also propose to model this description at the syllable rather than phone level. The ultimate goal of this work is to generate syllable models whose parameters explicitly describe the trajectories of the phonetic features...

  17. Speech recognition using linear dynamic models. 

    E-print Network

    Frankel, Joe; King, Simon

    2006-01-01

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors...

  18. Full Covariance Modelling for Speech Recognition 

    E-print Network

    Bell, Peter

    2010-01-01

    HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for ...

  19. Linear dynamic models for automatic speech recognition 

    E-print Network

    Frankel, Joe

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models (HMM), in which the output distribution associated with each state is modelled by a mixture of diagonal covariance Gaussians. Dynamic information is typically...

  20. Unstable connectionist networks in speech recognition 

    E-print Network

    Rohwer, Richard; Renals, Steve; Terry, Mark

    Connectionist networks evolve in time according to a prescribed rule. Typically, they are designed to be stable so that their temporal activity ceases after a short transient period. However, meaningful patterns in speech have a temporal component...

  1. Connectionist probability estimators in HMM speech recognition 

    E-print Network

    Renals, Steve; Morgan, Nelson; Bourlard, Herve; Cohen, Michael; Franco, Horacio

    The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review...

  2. Sparse gaussian graphical models for speech recognition. 

    E-print Network

    Bell, Peter; King, Simon

    2007-01-01

    We address the problem of learning the structure of Gaussian graphical models for use in automatic speech recognition, a means of controlling the form of the inverse covariance matrices of such systems. With particular focus on data sparsity issues...

  3. Speech recognition using linear dynamic models. 

    E-print Network

    Frankel, Joe; King, Simon

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with subphone states. This approach, whilst successful, models ...

  4. Speech therapy and voice recognition instrument

    NASA Technical Reports Server (NTRS)

    Cohen, J.; Babcock, M. L.

    1972-01-01

    Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

  5. Infinite Support Vector Machines in Speech Recognition

    E-print Network

    Yang, Jingzhou; van Dalien, Rogier C.; Gales, M. J. F.

    2013-01-01

    Generative feature spaces provide an elegant way to apply discriminative models in speech recognition, and system performance has been improved by adapting this framework. However, the classes in the feature space may be not linearly separable...

  6. Residual Speech Sound Disorders: Linguistic and Motoric

    E-print Network

    sulcus Left temporal pole 16 #12;What do brain differences mean? Speech perception circuits may be over the dyslexia circuit, but they do suggest processing of phonological information is disrupted 20 #12

  7. Visualizations: Speech, Language & Autistic Spectrum Disorder

    E-print Network

    Karahalios, Karrie G.

    be crippling for many children, in that language is "a unique characteristic of human behavior... [that, Experimentation, Human Fators Keywords Accessibility, Visualization, Autism, Children, Speech, Vocalization children, including those with Autistic Spectrum Disorder (ASD) have explicit difficulty developing

  8. Perceptual Evaluation of Video-Realistic Speech

    E-print Network

    Geiger, Gadi

    2003-02-28

    abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic ...

  9. Adaptive Noise Reduction of Speech Signals

    Microsoft Academic Search

    Wenqing Jiang; Henrique Malvar

    2000-01-01

    We propose a new adaptive speech noise removal algorithm based on a two-stage Wiener filtering. A first Wiener filter is used to produce a smoothed estimate of the a priori signal-to-noise ratio (SNR), aided by a classifier that separates speech from noise frames, and a second Wiener filter is used to generate the final output. Spectral analysis and synthesis is

  10. Vocabulary-Independent Indexing of Spontaneous Speech

    Microsoft Academic Search

    Peng Yu; Kaijiang Chen; Chengyuan Ma; Frank Seide

    2005-01-01

    We present a system for vocabulary-independent indexing of spontaneous speech, i.e., neither do we know the vocabulary of a speech recording nor can we predict which query terms for which a user is going to search. The technique can be applied to information retrieval, information extraction, and data mining. Our specific target is search in recorded conversations in the office\\/information-worker

  11. The need for a Speech Corpus

    Microsoft Academic Search

    DERMOT F. CAMPBELL; Ciaran McDonnell; Marty Meinardi; Bunny Richardson

    2007-01-01

    This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL\\/ESL students. The first section establishes the need for improvements in the teaching of listening skills and pronunciation practice for EFL\\/ESL students. It argues for the need to use authentic native-to-native speech in the teaching\\/learning process so as to promote social inclusion and

  12. Time-Domain Structural Analysis of Speech

    Microsoft Academic Search

    Kamil Ekstein; Roman Moucek

    2003-01-01

    This paper deals with an auxiliary speech signal parametrisation method based on structural analysis of speech signal in time\\u000a domain. The method called TIDOSA (TIme-DOmain Structural Analysis) grounds in analysing the “shape” of incoming waveform peaks.\\u000a The whole input acoustic signal is transformed into a sequence of peak shape class indices. The presented paper summarises\\u000a the features of TIDOSA-based processing

  13. An unrestricted vocabulary Arabic speech synthesis system

    Microsoft Academic Search

    YOUSIF A. EL-IMAM

    1989-01-01

    A method for synthesizing Arabic speech has been developed which uses a reasonably sized set of subphonetic elements as the synthesis units to allow synthesis of unlimited-vocabulary speech of good quality. The synthesis units have been defined after a careful study of the phonetic properties of modern standard Arabic, and they consist of central steady-state portions of vowels, central steady-state

  14. Review of Neural Networks for Speech Recognition

    Microsoft Academic Search

    Richard P. Lippmann

    1989-01-01

    The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multilayer networks with time delays can provide excellent discrimination between small sets of pre-segmented difficult-to-discriminate words, consonants, and vowels. Performance for these small

  15. Signal modeling techniques in speech recognition

    Microsoft Academic Search

    JOSEPH W. PICONE; Texas Instruments

    1993-01-01

    A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used. The four basic operations of signal modeling, i.e. spectral shaping, spectral analysis, parametric transformation, and statistical modeling, are discussed. Three important trends that have developed in the last five years in speech recognition are examined. First, heterogeneous parameter sets that mix absolute

  16. Large vocabulary continuous speech recognition using HTK

    Microsoft Academic Search

    P. C. Woodland; J. J. Odell; V. Valtchev; S. J. Young

    1994-01-01

    HTK is a portable software toolkit for building speech recognition systems using continuous density hidden Markov models developed by the Cambridge University Speech Group. One particularly successful type of system uses mixture density tied-state triphones. We have used this technique for the 5 k\\/20 k word ARPA Wall Street Journal (WSJ) task. We have extended our approach from using word-internal

  17. Motor movement matters: the flexible abstractness of inner speech

    PubMed Central

    Oppenheim, Gary M.; Dell, Gary S.

    2010-01-01

    Inner speech is typically characterized as either the activation of abstract linguistic representations or a detailed articulatory simulation that lacks only the production of sound. We present a study of the ‘speech errors’ that occur during the inner recitation of tongue-twister like phrases. Two forms of inner speech were tested: inner speech without articulatory movements and articulated (mouthed) inner speech. While mouthing one’s inner speech could reasonably be assumed to require more articulatory planning, prominent theories assume that such planning should not affect the experience of inner speech and consequently the errors that are ‘heard’ during its production. The errors occurring in articulated inner speech exhibited the phonemic similarity effect and lexical bias effect, two speech-error phenomena that, in overt speech, have been localized to an articulatory-feature processing level and a lexical-phonological level, respectively. In contrast, errors in unarticulated inner speech did not exhibit the phonemic similarity effect—just the lexical bias effect. The results are interpreted as support for a flexible abstraction account of inner speech. This conclusion has ramifications for the embodiment of language and speech and for the theories of speech production. PMID:21156877

  18. Neural restoration of degraded audiovisual speech

    PubMed Central

    Shahin, Antoine J.; Kerlin, Jess R.; Bhat, Jyoti; Miller, Lee M.

    2012-01-01

    When speech is interrupted by noise, listeners often perceptually “fill-in” the degraded signal, giving an illusion of continuity and improving intelligibility. This phenomenon involves a neural process in which the auditory cortex (AC) response to onsets and offsets of acoustic interruptions is suppressed. Since meaningful visual cues behaviorally enhance this illusory filling-in, we hypothesized that during the illusion, lip movements congruent with acoustic speech should elicit a weaker AC response to interruptions relative to static (no movements) or incongruent visual speech. AC response to interruptions was measured as the power and inter-trial phase consistency of the auditory evoked theta band (4-8 Hz) activity of the electroencephalogram (EEG) and the N1 and P2 auditory evoked potentials (AEPs). A reduction in the N1 and P2 amplitudes and in theta phase-consistency reflected the perceptual illusion at the onset and/or offset of interruptions regardless of visual condition. These results suggest that the brain engages filling-in mechanisms throughout the interruption, which repairs degraded speech lasting up to ~250 ms following the onset of the degradation. Behaviorally, participants perceived greater speech continuity over longer interruptions for congruent compared to incongruent or static audiovisual streams. However, this specific behavioral profile was not mirrored in the neural markers of interest. We conclude that lip-reading enhances illusory perception of degraded speech not by altering the quality of the AC response, but by delaying it during degradations so that longer interruptions can be tolerated. PMID:22178454

  19. Inner speech deficits in people with aphasia

    PubMed Central

    Langland-Hassan, Peter; Faries, Frank R.; Richardson, Michael J.; Dietz, Aimee

    2015-01-01

    Despite the ubiquity of inner speech in our mental lives, methods for objectively assessing inner speech capacities remain underdeveloped. The most common means of assessing inner speech is to present participants with tasks requiring them to silently judge whether two words rhyme. We developed a version of this task to assess the inner speech of a population of patients with aphasia and corresponding language production deficits. Patients’ performance on the silent rhyming task was severely impaired relative to controls. Patients’ performance on this task did not, however, correlate with their performance on a variety of other standard tests of overt language and rhyming abilities. In particular, patients who were generally unimpaired in their abilities to overtly name objects during confrontation naming tasks, and who could reliably judge when two words spoken to them rhymed, were still severely impaired (relative to controls) at completing the silent rhyme task. A variety of explanations for these results are considered, as a means to critically reflecting on the relations among inner speech, outer speech, and silent rhyme judgments more generally. PMID:25999876

  20. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

    ERIC Educational Resources Information Center

    Davidow, Jason H.

    2014-01-01

    Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

  1. Frequently Asked Questions: Speech and Language Disorders in the School Setting

    MedlinePLUS

    Frequently Asked Questions: Speech and Language Disorders in the School Setting What types of speech and language disorders affect school-age children ? Do speech-language disorders affect learning ? How may a speech- ...

  2. Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization

    E-print Network

    Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization Kuldip K cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding speech recognition (ASR) technology in the context of mobile communication systems. Speech recognition

  3. ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION Vikramjit Mitra1

    E-print Network

    Stolcke, Andreas

    ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION Vikramjit Mitra1 , Wen Wang1 and can potentially help to improve speech recognition performance. Most of the studies involving used such features for speech recognition. Speech recognition studies using articulatory information

  4. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

  5. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

  6. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

  7. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

  8. Speech Pathology in Ancient India--A Review of Sanskrit Literature.

    ERIC Educational Resources Information Center

    Savithri, S. R.

    1987-01-01

    The paper is a review of ancient Sanskrit literature for information on the origin and development of speech and language, speech production, normality of speech and language, and disorders of speech and language and their treatment. (DB)

  9. Preschool Speech Intelligibility and Vocabulary Skills Predict Long-Term Speech and Language Outcomes Following Cochlear Implantation in Early Childhood

    PubMed Central

    Castellanos, Irina; Kronenberger, William G.; Beer, Jessica; Henning, Shirley C.; Colson, Bethany G.; Pisoni, David B.

    2013-01-01

    Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants, but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine if early preschool measures of speech and language performance predict speech-language functioning in long-term users of cochlear implants. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3 – 6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes. PMID:23998347

  10. Methods for the digital processing and transmission of speech signals

    NASA Astrophysics Data System (ADS)

    Nazarov, M. V.; Prokhorov, Iu. N.

    Methods for increasing the efficiency of digital speech transmission systems are examined. In particular, attention is given to probabilistic models of speech signals, digital representation of speech, and digital processing (parameter evaluations, filtering, prediction, and detection) of speech signals in systems with pulse-code modulation (PCM), delta modulation, and differential PCM, as well as in vocoders. The dispersion of the effective estimates of the parameters and the relative noise immunity of speech signals are assessed. Results of experimental studies of various speech transmission systems are reported.

  11. A multimodal corpus of speech to infant and adult listeners.

    PubMed

    Johnson, Elizabeth K; Lahey, Mybeth; Ernestus, Mirjam; Cutler, Anne

    2013-12-01

    An audio and video corpus of speech addressed to 28 11-month-olds is described. The corpus allows comparisons between adult speech directed toward infants, familiar adults, and unfamiliar adult addressees as well as of caregivers' word teaching strategies across word classes. Summary data show that infant-directed speech differed more from speech to unfamiliar than familiar adults, that word teaching strategies for nominals versus verbs and adjectives differed, that mothers mostly addressed infants with multi-word utterances, and that infants' vocabulary size was unrelated to speech rate, but correlated positively with predominance of continuous caregiver speech (not of isolated words) in the input. PMID:25669300

  12. Study on achieving speech privacy using masking noise

    NASA Astrophysics Data System (ADS)

    Tamesue, Takahiro; Yamaguchi, Shizuma; Saeki, Tetsuro

    2006-11-01

    This study focuses on achieving speech privacy using a meaningless steady masking noise. The most effective index for achieving a satisfactory level of speech privacy was selected, choosing between spectral distance and the articulation index. From a result, spectral distance was selected as the best and most practical index for achieving speech privacy. Next, speech along with a masking noise with a sound pressure level value corresponding to various speech privacy levels were presented to subjects who judged the psychological impression of the particular speech privacy level. Theoretical calculations were in good agreement with the experimental results.

  13. Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2009

    2009-01-01

    Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…

  14. Speech Intelligibility and Accents in Speech-Mediated Interfaces: Results and Recommendations

    ERIC Educational Resources Information Center

    Lawrence, Halcyon M.

    2013-01-01

    There continues to be significant growth in the development and use of speech--mediated devices and technology products; however, there is no evidence that non-native English speech is used in these devices, despite the fact that English is now spoken by more non-native speakers than native speakers, worldwide. This relative absence of nonnative…

  15. Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

    ERIC Educational Resources Information Center

    Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

    2014-01-01

    Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…

  16. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Model-Based Speech Enhancement with Improved

    E-print Network

    So, Hing-Cheung

    -based approach to enhance noisy speech using an analysis-synthesis framework. Target speech is reconstructed model (HNM). Acoustic parameters such as pitch, spectral envelope, and spectral gain are extracted from trajectories through Kalman filtering. System identification of Kalman filter is achieved via a combined design

  17. Unit selection in a concatenative speech synthesis system using a large speech database

    Microsoft Academic Search

    Andrew J. Hunt; Alan W. Black

    1996-01-01

    One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database

  18. Perception of "Elliptical Speech" Following Cochlear Implantation: Use of Broad Phonetic Categories in Speech Perception.

    ERIC Educational Resources Information Center

    Herman, Rebecca; Pisoni, David B.

    2000-01-01

    A study investigated perception of elliptical speech in an adult cochlear implant patient. Two experiments were conducted using sets of meaningful and anomalous English sentences: one set contained correct place of articulation cues, the other was transformed into elliptical speech. The patient and controls labeled the sentences as the same.…

  19. Critical cues for auditory pattern recognition in speech: Implications for cochlear implant speech processor design

    E-print Network

    Allen, Jont

    Critical cues for auditory pattern recognition in speech: Implications for cochlear implant speech in conditions with full spectral cues. Cochlear implant (CI) patients and hearing-impaired (HI) listeners have Implants and Perception, House Ear Institute 2100 W. Third St., Los Angeles, CA 90057, USA shannon

  20. The Clinical Practice of Speech and Language Therapists with Children with Phonologically Based Speech Sound Disorders

    ERIC Educational Resources Information Center

    Oliveira, Carla; Lousada, Marisa; Jesus, Luis M. T.

    2015-01-01

    Children with speech sound disorders (SSD) represent a large number of speech and language therapists' caseloads. The intervention with children who have SSD can involve different therapy approaches, and these may be articulatory or phonologically based. Some international studies reveal a widespread application of articulatory based approaches in…

  1. Treatment of Speech Anxiety: A Sequential Dismantling of Speech Skills Training, Coping Skills Training, and Paradox.

    ERIC Educational Resources Information Center

    Worthington, Everett L., Jr.; And Others

    Thirty-two speech anxious college students participated in a study that examined whether four treatments that have been effective when applied separately would be equally effective when applied in combination. The treatments were (1) systemic desensitization (SD), (2) speech skills training (SST), (3) SST combined with coping skills training (CST)…

  2. Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?

    ERIC Educational Resources Information Center

    Rosen, Stuart; Iverson, Paul

    2007-01-01

    Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…

  3. A Motor Speech Assessment for Children with Severe Speech Disorders: Reliability and Validity Evidence

    ERIC Educational Resources Information Center

    Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.

    2013-01-01

    Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…

  4. Exploiting correlogram structure for robust speech recognition with multiple speech sources

    E-print Network

    Barker, Jon

    to separate noise from speech using cues from multiple sensors, e.g. blind source separation by independent treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram

  5. Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech

    ERIC Educational Resources Information Center

    Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

    2012-01-01

    Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…

  6. Spotlight on Speech Codes 2011: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2011

    2011-01-01

    Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and accompanying report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…

  7. Dramatic Effects of Speech Task on Motor and Linguistic Planning in Severely Dysfluent Parkinsonian Speech

    ERIC Educational Resources Information Center

    Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

    2012-01-01

    In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…

  8. Optimal speech level for speech transmission in a noisy environment for young adults and aged persons

    NASA Astrophysics Data System (ADS)

    Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi

    2005-04-01

    Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.

  9. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    PubMed Central

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  10. Role of binaural hearing in speech intelligibility and spatial release from masking using vocoded speech

    E-print Network

    Litovsky, Ruth

    to remove both speech and binaural temporal fine-structure cues. Speech reception thresholds SRTs were is likely to improve performance in bilaterally implanted recipients. © 2009 Acoustical Society of America; Battmer et al., 1997; Stickney et al., 2004 . Numerous studies that focus on performance in unilateral CI

  11. Speech Motor Programming in Apraxia of Speech: Evidence from a Delayed Picture-Word Interference Task

    ERIC Educational Resources Information Center

    Mailend, Marja-Liisa; Maas, Edwin

    2013-01-01

    Purpose: Apraxia of speech (AOS) is considered a speech motor programming impairment, but the specific nature of the impairment remains a matter of debate. This study investigated 2 hypotheses about the underlying impairment in AOS framed within the Directions Into Velocities of Articulators (DIVA; Guenther, Ghosh, & Tourville, 2006) model: The…

  12. Biomechanical models of speech articulators to study speech motor control Pascal Perrier1

    E-print Network

    Payan, Yohan

    Biomechanical models of speech articulators to study speech motor control Pascal Perrier1 , Yohan. These articulators are made either of bones or of soft tissues. Thus, they have complex and variable biomechanical properties. In our research group, we believe that the influence of these biomechanical properties

  13. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  14. Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception

    ERIC Educational Resources Information Center

    Stekelenburg, Jeroen J.; Vroomen, Jean

    2012-01-01

    We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…

  15. Cortical entrainment to continuous speech: functional roles and interpretations

    PubMed Central

    Ding, Nai; Simon, Jonathan Z.

    2014-01-01

    Auditory cortical activity is entrained to the temporal envelope of speech, which corresponds to the syllabic rhythm of speech. Such entrained cortical activity can be measured from subjects naturally listening to sentences or spoken passages, providing a reliable neural marker of online speech processing. A central question still remains to be answered about whether cortical entrained activity is more closely related to speech perception or non-speech-specific auditory encoding. Here, we review a few hypotheses about the functional roles of cortical entrainment to speech, e.g., encoding acoustic features, parsing syllabic boundaries, and selecting sensory information in complex listening environments. It is likely that speech entrainment is not a homogeneous response and these hypotheses apply separately for speech entrainment generated from different neural sources. The relationship between entrained activity and speech intelligibility is also discussed. A tentative conclusion is that theta-band entrainment (4–8 Hz) encodes speech features critical for intelligibility while delta-band entrainment (1–4 Hz) is related to the perceived, non-speech-specific acoustic rhythm. To further understand the functional properties of speech entrainment, a splitter’s approach will be needed to investigate (1) not just the temporal envelope but what specific acoustic features are encoded and (2) not just speech intelligibility but what specific psycholinguistic processes are encoded by entrained cortical activity. Similarly, the anatomical and spectro-temporal details of entrained activity need to be taken into account when investigating its functional properties. PMID:24904354

  16. Modelling out-of-vocabulary words for robust speech recognition

    E-print Network

    Bazzi, Issam

    2002-01-01

    This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some ...

  17. Subject Speech Rates as a Function of Interviewer Behaviour

    ERIC Educational Resources Information Center

    Webb, James T.

    1969-01-01

    Uses standardized and nonstandardized interview situations to examine several areas for two speech rate measures. Results show that the interviewer's speech rate influences that of the subject. Tables, graphs, and bibliography. (RW)

  18. Automatically clustering similar units for unit selection in speech synthesis. 

    E-print Network

    Black, Alan W; Taylor, Paul A

    1997-01-01

    This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class ...

  19. An annotation scheme for concept-to-speech synthesis. 

    E-print Network

    Hitzeman, Janet; Black, Alan W; Taylor, Paul; Mellish, Chris; Oberlander, Jon

    1999-01-01

    The SOLE conecept-to-speech system uses linguistic information provided by an NLG component to improve the intonation of synthetic speech. As the text is generated, the system automatically annotates the text with linguistic ...

  20. Using intonation to constrain language models in speech recognition. 

    E-print Network

    Taylor, Paul A; King, Simon; Isard, Stephen; Wright, Helen; Kowtko, Jacqueline C

    1997-01-01

    This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of ...

  1. TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Speech Understanding Performance of Cochlear

    E-print Network

    TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Speech Understanding Performance of Cochlear Implant, and Jan Wouters Abstract--Cochlear Implant (CI) recipients report severe degradation of speech, cochlear implants, time fre- quency masking, phase error variance. I. INTRODUCTION ACochlear Implant (CI

  2. Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech 

    E-print Network

    De Leon, P.L.; Pucher, M.; Yamagishi, Junichi

    2010-01-01

    In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis ...

  3. Unsupervised adaptation for HMM-based speech synthesis 

    E-print Network

    King, Simon; Tokuda, Keiichi; Zen, Heiga; Yamagishi, Junichi

    It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been...

  4. STATISTICAL LANGUAGE MODELING FOR SPEECH DISFLUENCIES Andreas Stolcke Elizabeth Shriberg

    E-print Network

    Stolcke, Andreas

    Technology and Research Laboratory SRI International, Menlo Park, CA 94025 stolcke@speech.sri.com ees@speech.sri no significant impact on recognition accuracy. We also note that for modeling of the most frequent type of dis

  5. Multi-level acoustic modeling for automatic speech recognition

    E-print Network

    Chang, Hung-An, Ph. D. Massachusetts Institute of Technology

    2012-01-01

    Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local ...

  6. Overview of speech technology of the 80's

    SciTech Connect

    Crook, S.B.

    1981-01-01

    The author describes the technology innovations necessary to accommodate the market need which is the driving force toward greater perceived computer intelligence. The author discusses aspects of both speech synthesis and speech recognition.

  7. PRONUNCIATION VERIFICATION OF CHILDREN'S SPEECH FOR AUTOMATIC LITERACY ASSESSMENT

    E-print Network

    Alwan, Abeer

    PRONUNCIATION VERIFICATION OF CHILDREN'S SPEECH FOR AUTOMATIC LITERACY ASSESSMENT Joseph Tepperman1 part of automatically assessing a new reader's literacy is in verifying his pronunciation of read% of the time. Index Terms: children's speech, literacy, pronunciation 1. INTRODUCTION Automatically assessing

  8. SPEAKER VERIFICATION FROM CODED TELEPHONE SPEECH USING STOCHASTIC

    E-print Network

    Mak, Man-Wai

    SPEAKER VERIFICATION FROM CODED TELEPHONE SPEECH USING STOCHASTIC FEATURE TRANSFORMATION Abstract. A handset compensation technique for speaker verification from coded telephone speech is proposed' identity over the telephone. A chal- lenge of telephone-based speaker verification is that transducer

  9. Markers of Deception in Italian Speech

    PubMed Central

    Spence, Katelyn; Villar, Gina; Arciuli, Joanne

    2012-01-01

    Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavor. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within-subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception. PMID:23162502

  10. Music and speech prosody: a common rhythm

    PubMed Central

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  11. Recognizing hesitation phenomena in continuous, spontaneous speech

    NASA Astrophysics Data System (ADS)

    Oshaughnessy, Douglas

    Spontaneous speech differs from read speech in speaking rate and hesitation. In natural, spontaneous speech, people often start talking and then think along the way; at times, this causes the speech to have hesitation pauses (both filled and unfilled) and restarts. Results are reported on all types of pauses in a widely-used speech database, for both hesitation pauses and semi-intentional pauses. A distinction is made between grammatical pauses (at major syntactic boundaries) and ungrammatical ones. Different types of unfilled pauses cannot be reliably separated based on silence duration, although grammatical pauses tend to be longer. In the prepausal word before ungrammatical pauses, there were few continuation rises in pitch, whereas 80 percent of the grammatical pauses were accompanied by a prior fundamental frequency rise of 10-40 kHz. Identifying the syntactic function of such hesitation phenomena can improve recognition performance by eliminating from consideration some of the hypotheses proposed by an acoustic recognizer. Results presented allow simple identification of filled pauses (such as uhh, umm) and their syntactic function.

  12. Prior listening in rooms improves speech intelligibility.

    PubMed

    Brandewie, Eugene; Zahorik, Pavel

    2010-07-01

    Although results from previous studies have demonstrated that the acoustic effects of a single reflection are perceptually suppressed after repeated exposure to a particular configuration of source and reflection, the extent to which this dynamic echo suppression might generalize to speech understanding in room environments with multiple reflections and reverberation is largely unknown. Here speech intelligibility was measured using the coordinate response measure corpus both with and without prior listening exposure to a reverberant room environment, which was simulated using virtual auditory space techniques. Prior room listening exposure was manipulated by presenting either a two-sentence carrier phrase that preceded the target speech, or no carrier phrase within the room environment. Results from 14 listeners indicate that with prior room exposure, masked speech reception thresholds were on average 2.7 dB lower than thresholds without exposure, an improvement in intelligibility of over 18 percentage points on average. This effect, which is shown to be absent in anechoic space and greatly reduced under monaural listening conditions, demonstrates that prior binaural exposure to reverberant rooms can improve speech intelligibility, perhaps due to a process of perceptual adaptation to the acoustics of the listening room. PMID:20649224

  13. Fifty years of progress in speech understanding systems

    Microsoft Academic Search

    Victor Zue

    2004-01-01

    Researchers working on human-machine interfaces realized nearly 50 years ago that automatic speech recognition (ASR) alone is not sufficient; one needs to impart linguistic knowledge to the system such that the signal could ultimately be understood. A speech understanding system combines speech recognition (i.e., the speech to symbols conversion) with natural language processing (i.e., the symbol to meaning transformation) to

  14. Motor Activation From Visible Speech: Evidence From Stimulus Response Compatibility

    Microsoft Academic Search

    Dirk Kerzel; Harold Bekkering

    2000-01-01

    In speech perception, phonetic information can be acquired optically as well as acoustically. The motor theory of speech perception holds that motor control structures are involved in the processing of visible speech, whereas perceptual accounts do not make this assumption. Motor involvement in speech perception was examined by showing participants response-irrelevant movies of a mouth articulating \\/b&Lgr\\/ or \\/d&Lgr\\/ and

  15. Mandarin Chinese speech recognition by pediatric cochlear implant users

    Microsoft Academic Search

    Meimei Zhu; Qian-Jie Fu; John J. Galvin; Ye Jiang; Jianghong Xu; Chenmei Xu; Duoduo Tao; Bing Chen

    2011-01-01

    ObjectivesBecause of difficulties associated with pediatric speech testing, most pediatric cochlear implant (CI) speech studies necessarily involve basic and simple perceptual tasks. There are relatively few studies regarding Mandarin-speaking pediatric CI users’ perception of more difficult speech materials (e.g., words and sentences produced by multiple talkers). Difficult speech materials and tests necessarily require older pediatric CI users, who may have

  16. Mandarin speech coding using a modified RPE-LTP technique

    Microsoft Academic Search

    Ian McLoughlin; Ding Zhong-Qiang

    2000-01-01

    This paper proposes a novel speech codec named the Chinese RPE-LTP (CRPE-LTP), which exploits some of the unique characteristics of Mandarin in speech in order to improve speech quality, for Mandarin speakers. Although the codec is based on the proven principles of the GSM06.10 RPE-LTP coder; its performance is better than that of GSM for coding Mandarin speech, and is

  17. 570 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 8, NOVEMBER 2002 Low-Bitrate Distributed Speech Recognition for

    E-print Network

    Alwan, Abeer

    -Bitrate Distributed Speech Recognition for Packet-Based and Wireless Communication Alexis Bernard, Student Member for distributed (wireless or packet- based) speech recognition. It is shown that speech recognition as opposed or less. Index Terms--Automatic speech recognition, distributed speech recognition (DSR), joint channel

  18. Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech

    PubMed Central

    Yurovsky, Daniel; Yu, Chen; Smith, Linda B.

    2012-01-01

    In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of high transitional probabilities, but also because of a consistently co-occurring object. But additional statistics require additional processing, and thus might not be useful to cognitively constrained learners. We show that the structure of child-directed speech makes simultaneous speech segmentation and word learning tractable for human learners. First, a corpus of child-directed speech was recorded from parents and children engaged in a naturalistic free-play task. Analyses revealed two consistent regularities in the sentence structure of naming events. These regularities were subsequently encoded in an artificial language to which adult participants were exposed in the context of simultaneous statistical speech segmentation and word learning. Either regularity was independently sufficient to support successful learning, but no learning occurred in the absence of both regularities. Thus, the structure of child-directed speech plays an important role in scaffolding speech segmentation and word learning in parallel. PMID:23162487

  19. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music

    PubMed Central

    Lee, Hweeling; Noppeney, Uta

    2014-01-01

    This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech. PMID:25147539

  20. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  1. Lip Movement Exaggerations during Infant-Directed Speech

    ERIC Educational Resources Information Center

    Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

    2010-01-01

    Purpose: Although a growing body of literature has identified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their…

  2. Speech and Language Therapy Intervention in Schizophrenia: A Case Study

    ERIC Educational Resources Information Center

    Clegg, Judy; Brumfitt, Shelagh; Parks, Randolph W.; Woodruff, Peter W. R.

    2007-01-01

    Background: There is a significant body of evidence documenting the speech and language abnormalities found in adult psychiatric disorders. These speech and language impairments can create additional social barriers for the individual and may hinder effective communication in psychiatric treatment and management. However, the role of speech and…

  3. Neural mechanisms underlying auditory feedback control of speech

    Microsoft Academic Search

    Jason A. Tourville; Kevin J. Reilly; Frank H. Guenther

    2008-01-01

    The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real

  4. Research Article Preschool Speech Error Patterns Predict Articulation

    E-print Network

    Research Article Preschool Speech Error Patterns Predict Articulation and Phonological Awareness,a and Mary Louise Edwardsc Purpose: To determine if speech error patterns in preschoolers with speech sound: Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested

  5. The Influence of Bilingualism on Speech Production: A Systematic Review

    ERIC Educational Resources Information Center

    Hambly, Helen; Wren, Yvonne; McLeod, Sharynne; Roulstone, Sue

    2013-01-01

    Background: Children who are bilingual and have speech sound disorder are likely to be under-referred, possibly due to confusion about typical speech acquisition in bilingual children. Aims: To investigate what is known about the impact of bilingualism on children's acquisition of speech in English to facilitate the identification and treatment of…

  6. Rhythmic Priming Enhances the Phonological Processing of Speech

    ERIC Educational Resources Information Center

    Cason, Nia; Schon, Daniele

    2012-01-01

    While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…

  7. Do 6-Month-Olds Understand That Speech Can Communicate?

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Martin, Alia; Onishi, Kristine H.

    2014-01-01

    Adults and 12-month-old infants recognize that even unfamiliar speech can communicate information between third parties, suggesting that they can separate the communicative function of speech from its lexical content. But do infants recognize that speech can communicate due to their experience understanding and producing language, or do they…

  8. 38 CFR 8.18 - Total disability-speech.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...2010-07-01 2010-07-01 false Total disability-speech. 8.18 Section 8.18 Pensions, Bonuses...Total Disability § 8.18 Total disability—speech. The organic loss of speech shall be deemed to be total disability under...

  9. ON AMPLITUDE MODULATION FOR MONAURAL SPEECH SEGREGATION Biophysics Program,

    E-print Network

    Wang, DeLiang "Leon"

    than previous CASA systems. I. INTRODUCTION In the real-world environment, target speech usually occurs simultaneously with acoustic interference. An effective speech segregation system will greatly facilitate many applications, including automatic speech recognition (ASR) and speaker identification. Many systems have been

  10. Freedom of Speech, Cyberspace, Harassment Law, and the Clinton Administration

    Microsoft Academic Search

    Eugene Volokh

    2000-01-01

    Volokh presents four cyberspace speech controversies that involve an interesting modern body of speech restrictions: hostile environment harassment law. These examples illustrate three things--in most of the controversies, the result should be driven not by the medium, but by the underlying free speech principles; that the Clinton Administration's role in these areas has been comparatively slight; and that each of

  11. AUTOMATICALLY CLUSTERING SIMILAR UNITS FOR UNIT SELECTION IN SPEECH SYNTHESIS.

    E-print Network

    Edinburgh, University of

    AUTOMATICALLY CLUSTERING SIMILAR UNITS FOR UNIT SELECTION IN SPEECH SYNTHESIS. Alan W Black This paper describes a new method for synthesiz- ing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone

  12. Speech for the Deaf Child: Knowledge and Use.

    ERIC Educational Resources Information Center

    Connor, Leo E., Ed.

    Presented is a collection of 16 papers on speech development, handicaps, teaching methods, and educational trends for the aurally handicapped child. Arthur Boothroyd relates acoustic phonetics to speech teaching, and Jean Utley Lehman investigates a scheme of linguistic organization. Differences in speech production by deaf and normal hearing…

  13. Single-Ended Speech Quality Measurement Using Machine Learning Methods

    Microsoft Academic Search

    Tiago H. Falk; Wai-Yip Chan

    2006-01-01

    We describe a novel single-ended algorithm con- structed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are

  14. Employer Images of Speech Communication Majors: A Question of Employability.

    ERIC Educational Resources Information Center

    Heath, Robert L.

    In order to understand the market climate for speech majors, the Department of Speech at the University of Houston conducted a survey to assess the marketability of speech communication graduates in the Greater Houston area. It further attempted to disclose the skills needed to increase employability. Seventy-one questionnaires, designed to focus…

  15. Improving Reverberant VTS for Hands-free Robust Speech Recognition

    E-print Network

    Edinburgh, University of

    to the microphone. The reverberation effect can be described as a convolution of clean speech with Room Impulse is modelled as a superposition of the back- ground noise and the reverberation of clean speech. This yields and attenuated copies of previous clean speech. There are several approaches in the literature to handle

  16. Differential Diagnosis of Children with Suspected Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Murray, Elizabeth; McCabe, Patricia; Heard, Robert; Ballard, Kirrie J.

    2015-01-01

    Purpose: The gold standard for diagnosing childhood apraxia of speech (CAS) is expert judgment of perceptual features. The aim of this study was to identify a set of objective measures that differentiate CAS from other speech disorders. Method: Seventy-two children (4-12 years of age) diagnosed with suspected CAS by community speech-language…

  17. Speed of Speech and Persuasion: Evidence for Multiple Effects.

    ERIC Educational Resources Information Center

    Smith, Stephen M.; Shaffer, David R.

    1995-01-01

    Examined possibility that increased speech rate affects persuasion either by acting as an agreement cue or through impact on message processing. Fast speech inhibited participants' tendency to differentially agree with strong versus weak message arguments under both moderate and high relevance. However, fast speech was associated with increased…

  18. A Comparison of Two Theories of Speech/Language Behavior.

    ERIC Educational Resources Information Center

    McQuillen, Jeffrey S.; Quigley, Tracy A.

    Two theories of speech appear to parallel each other closely, though one (E. Nuttall) is concerned mainly with speech from a functional perspective, and the other (F. Williams and R. Naremore) presents a developmental hierarchy of language form and function. Nuttall suggests there are two main origins of speech: sounds of discomfort (cries,…

  19. Consolidation-Based Speech Translation and Evaluation Approach

    NASA Astrophysics Data System (ADS)

    Hori, Chiori; Zhao, Bing; Vogel, Stephan; Waibel, Alex; Kashioka, Hideki; Nakamura, Satoshi

    The performance of speech translation systems combining automatic speech recognition (ASR) and machine translation (MT) systems is degraded by redundant and irrelevant information caused by speaker disfluency and recognition errors. This paper proposes a new approach to translating speech recognition results through speech consolidation, which removes ASR errors and disfluencies and extracts meaningful phrases. A consolidation approach is spun off from speech summarization by word extraction from ASR 1-best. We extended the consolidation approach for confusion network (CN) and tested the performance using TED speech and confirmed the consolidation results preserved more meaningful phrases in comparison with the original ASR results. We applied the consolidation technique to speech translation. To test the performance of consolidation-based speech translation, Chinese broadcast news (BN) speech in RT04 were recognized, consolidated and then translated. The speech translation results via consolidation cannot be directly compared with gold standards in which all words in speech are translated because consolidation-based translations are partial translations. We would like to propose a new evaluation framework for partial translation by comparing them with the most similar set of words extracted from a word network created by merging gradual summarizations of the gold standard translation. The performance of consolidation-based MT results was evaluated using BLEU. We also propose Information Preservation Accuracy (IPAccy) and Meaning Preservation Accuracy (MPAccy) to evaluate consolidation and consolidation-based MT. We confirmed that consolidation contributed to the performance of speech translation.

  20. Acoustic Markov models used in the Tangora speech recognition system

    Microsoft Academic Search

    L. R. Bahl; P. F. Brown; P. V. de Souza; M. A. Picheny

    1988-01-01

    The Speech Recognition Group at IBM Research has developed a real-time, isolated-word speech recognizer called Tangora, which accepts natural English sentences drawn from a vocabulary of 20000 words. Despite its large vocabulary, the Tangora recognizer requires only about 20 minutes of speech from each new user for training purposes. The accuracy of the system and its ease of training are

  1. Variational bayesian estimation and clustering for speech recognition

    Microsoft Academic Search

    Shinji Watanabe; Yasuhiro Minami; Atsushi Nakamura; Naonori Ueda

    2004-01-01

    In this paper we propose Variational Bayesian Estimation and Clustering for speech recognition (VBEC), which is based on the Variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the Maximum Likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two

  2. SOME PERSPECTIVES ON SPEECH DATABASE DEVELOPMENT Lori F. Lamel

    E-print Network

    FRANCE ABSTRACT The article, Speech Database Development: Design and Analysis of the Acoustic Phonetic in the design of speech databases. The acoustic-phonetic portion of TIMIT was designed to have comprehensiveSOME PERSPECTIVES ON SPEECH DATABASE DEVELOPMENT Lori F. Lamel LIMSI-CNRS BP 133 91403 ORSAY Cedex

  3. ARMA lattice modeling for isolated word speech recognition

    Microsoft Academic Search

    H. K. Kwan; Tracy X. Li

    2000-01-01

    In this paper, we introduce an auto-regressive moving average (ARMA) lattice model for speech modeling. The speech characteristics are modeled and expressed in the form of lattice reflection coefficients for classification. Self Organization Map (SOM) is used to build codebooks for classification and recognition of the lattice reflection coefficients. Experimental results based on an isolated word speech database of 10

  4. Family Pedigrees of Children with Suspected Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy; Taylor, H. Gerry; Iyengar, Sudha; Shriberg, Lawrence D.

    2004-01-01

    Forty-two children (29 boys and 13 girls), ages 3-10 years, were referred from the caseloads of clinical speech-language pathologists for suspected childhood apraxia of speech (CAS). According to results from tests of speech and oral motor skills, 22 children met criteria for CAS, including a severely limited consonant and vowel repertoire,…

  5. Nonlinear feature based classification of speech under stress

    Microsoft Academic Search

    Guojun Zhou; John H. L. Hansen; James F. Kaiser

    2001-01-01

    Studies have shown that variability introduced by stress or emotion can severely reduce speech recognition accuracy. Techniques for detecting or assessing the presence of stress could help improve the robustness of speech recognition systems. Although some acoustic variables derived from linear speech production theory have been investigated as indicators of stress, they are not always consistent. Three new features derived

  6. The Informative Speech as a Television News Package

    ERIC Educational Resources Information Center

    Thorpe, Judith M.

    2004-01-01

    Objective: To expand an informative speech into a television news package. Type of speech: Informative. Point value: 5% of course grade (Note: The original informative speech is worth 10% of the course grade). Requirements: (a) References: 3; (b) Length: 30 seconds; (c) Visual aid: 3; (d) Outline: Yes; (e) Prerequisite reading: Chapter 14 (Whitman…

  7. TWO AUTOMATIC APPROACHES FOR ANALYZING CONNECTED SPEECH PROCESSES IN DUTCH

    E-print Network

    Edinburgh, University of

    , indications of which CSPs are present in the material can be found. These indications can be used to generate by many complex factors such as speech style, speech rate, word frequency, information load, dialectal of the type of processes that might occur in the speech material and that can further be tested by means

  8. Further experiments on audio-visual speech source separation

    Microsoft Academic Search

    David Sodoyer; Laurent Girin; Christian Jutten; Jean-Luc Schwartz

    Looking at the speaker's face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical tools, on

  9. Developing an audio-visual speech source separation algorithm

    Microsoft Academic Search

    David Sodoyer; Laurent Girin; Christian Jutten; Jean-luc Schwartz

    2004-01-01

    Looking at the speakers face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio- visual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by sta- tistical tools on classical blind source separation algorithms

  10. Multichannel Signal Separation for Cocktail Party Speech Recognition: A Dynamic

    E-print Network

    Choi, Seungjin

    Multichannel Signal Separation for Cocktail Party Speech Recognition: A Dynamic Recurrent Network separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental speech recognition experiment. The results show that our proposed method dramatically improves the word

  11. Subband-Based Blind Signal Separation for Noisy Speech Recognition

    E-print Network

    Lee, Te-Won

    Subband-Based Blind Signal Separation for Noisy Speech Recognition Hyung-Min Park , Ho-Young Jung processing, noisy speech recognition, indepen- dent component analysis, blind source separation A method issue in the field of automatic speech recognition. Microphone arrays have been used to achieve noise

  12. SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE

    E-print Network

    Johnson, Michael T.

    SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B, Wisconsin May 2004 #12;ii Abstract A speech recognition system implements the task of automatically of the Automatic Speech Recognition (ASR) systems and human listeners. In this thesis, a novel signal analysis

  13. LIKELIHOOD DECISION BOUNDARY ESTIMATION BETWEEN HMM PAIRS In SPEECH RECOGNITION

    E-print Network

    LIKELIHOOD DECISION BOUNDARY ESTIMATION BETWEEN HMM PAIRS In SPEECH RECOGNITION Levent M. Arslan likelihood estimation of hidden Markov models for speech recognition, the criterion is to maximize the total Recognition Classifier Improvements and Comparisons submitted December 2, 1995 to IEEE Trans. on Speech

  14. Speech Recognition Using Features Extracted from Phase Space

    E-print Network

    Johnson, Michael T.

    Speech Recognition Using Features Extracted from Phase Space Reconstructions by Andrew Carl University Milwaukee, Wisconsin May 2003 #12;iii Preface A novel method for speech recognition is presented natural distribution and trajectory of the attractor) can be extracted for speech recognition

  15. Influences of Infant-Directed Speech on Early Word Recognition

    ERIC Educational Resources Information Center

    Singh, Leher; Nestor, Sarah; Parikh, Chandni; Yull, Ashley

    2009-01-01

    When addressing infants, many adults adopt a particular type of speech, known as infant-directed speech (IDS). IDS is characterized by exaggerated intonation, as well as reduced speech rate, shorter utterance duration, and grammatical simplification. It is commonly asserted that IDS serves in part to facilitate language learning. Although…

  16. ON-LINE CURSIVE HANDWRITING RECOGNITION USING SPEECH RECOGNITION METHODS

    E-print Network

    Starner, Thad E.

    ON-LINE CURSIVE HANDWRITING RECOGNITION USING SPEECH RECOGNITION METHODS Thad Starnert , John 02138 Email: Makhoul@bbn.com ABSTRACT A hidden Markov model (HMM) based continuous speech recognition between the continuous speech and on-line cursive handwriting recognition tasks are explored; the handwrit

  17. Spectral Subband Centroids as Features for Speech Recognition

    E-print Network

    Spectral Subband Centroids as Features for Speech Recognition Kuldip K. Paliwal School are perhaps the most commonly used features in currently available speech recognition systems. In this paper for speech recognition. We show that these features have properties similar to formant frequencies

  18. ON THE GENERALIZATION OF SHANNON ENTROPY FOR SPEECH RECOGNITION

    E-print Network

    Paris-Sud XI, Université de

    ON THE GENERALIZATION OF SHANNON ENTROPY FOR SPEECH RECOGNITION Nicolas Obin, Marco Liuni IRCAM and speech recognition. The proposed representation is based on the R´enyi entropy, which is a generalization% in relative error reduction, and is particularly significant for the recognition of noisy speech - i

  19. Exploiting confidence measures for missing data speech recognition Christophe Cerisara

    E-print Network

    Paris-Sud XI, Université de

    Exploiting confidence measures for missing data speech recognition Christophe Cerisara LORIA UMR 7503 - France Automatic speech recognition in highly non-stationary noise, for instance. These new missing data "masks" are now estimated based on speech recognition confidence measures, which can

  20. Speech Recognition in a Smart Home: Some Experiments for Telemonitoring

    E-print Network

    Paris-Sud XI, Université de

    Speech Recognition in a Smart Home: Some Experiments for Telemonitoring Michel Vacher, Noé Guirand about the utility of sound in patient's habitation. However, sound classification and speech recognition. In this paper, we present a global speech and sound recognition system that can be set-up in a flat. Eight