Science.gov

Sample records for speech

  1. Speech Development

    MedlinePLUS

    ... able to assess your child’s speech production and language development and make appropriate therapy recommendations. It is also ... pathologist should consistently assess your child’s speech and language development, as well as screen for hearing problems (with ...

  2. VISIBLE SPEECH.

    ERIC Educational Resources Information Center

    POTTER, RALPH K.; AND OTHERS

    A CORRECTED REPUBLICATION OF THE 1947 EDITION, THE BOOK DESCRIBES A FORM OF VISIBLE SPEECH OBTAINED BY THE RECORDING OF AN ANALYSIS OF SPEECH SOMEWHAT SIMILAR TO THE ANALYSIS PERFORMED BY THE EAR. ORIGINALLY INTENDED TO PRESENT AN EXPERIMENTAL TRAINING PROGRAM IN THE READING OF VISIBLE SPEECH AND EXPANDED TO INCLUDE MATERIAL OF INTEREST TO VARIOUS…

  3. Symbolic Speech

    ERIC Educational Resources Information Center

    Podgor, Ellen S.

    1976-01-01

    The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

  4. Obama Speeches

    E-print Network

    Hacker, Randi

    2009-02-25

    Broadcast Transcript: Barack Obama is already having a global impact. The Japanese are using his speeches to help teach English. In fact, a book with his collected speeches is currently at the top of the bestseller list on Amazon's Japanese website...

  5. Speech Aids

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

  6. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.

  7. Speech Problems

    MedlinePLUS

    ... are thinking, but it becomes disorganized while actually speaking. Because of this disorganization, someone who clutters may ... refuse to wait patiently for them to finish speaking. If you have a speech problem, don't ...

  8. Speech production knowledge in automatic speech recognition 

    E-print Network

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam

    2007-01-01

    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds and numerous models, speech production knowledge is ...

  9. Free Speech Yearbook: 1972.

    ERIC Educational Resources Information Center

    Tedford, Thomas L., Ed.

    This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

  10. Speech Research

    NASA Astrophysics Data System (ADS)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  11. Keynote Speeches.

    ERIC Educational Resources Information Center

    2000

    This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities of Hong…

  12. Speech communications in noise

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  13. SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS

    E-print Network

    SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gaji of automatic speech recognition systems (ASR) against additive background noise, by finding speech parameters noises. 1. INTRODUCTION State-of-the-art automatic speech recognition (ASR) systems are capable

  14. Speech and Language Disorders

    MedlinePLUS

    ... This information in Spanish ( en español ) Speech and language disorders More information on speech and language disorders ... Return to top More information on Speech and language disorders Explore other publications and websites Aphasia - This ...

  15. Speech impairment (adult)

    MedlinePLUS

    Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... Common speech and language disorders include: APHASIA Aphasia is ... understand or express spoken or written language. It commonly ...

  16. Speech recognition and understanding

    SciTech Connect

    Vintsyuk, T.K.

    1983-05-01

    This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

  17. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ...Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...speech disabilities have access to relay services that...electronically using the Internet by accessing the Commission's...services; (5) equal access to interexchange carriers...Speech-to-Speech and Internet Protocol (IP)...

  18. Speech research directions

    SciTech Connect

    Atal, B.S.; Rabiner, L.R.

    1986-09-01

    This paper presents an overview of the current activities in speech research. The authors discuss the state of the art in speech coding, text-to-speech synthesis, speech recognition, and speaker recognition. In the speech coding area, current algorithms perform well at bit rates down to 9.6 kb/s, and the research is directed at bringing the rate for high-quality speech coding down to 2.4 kb/s. In text-to-speech synthesis, what we currently are able to produce is very intelligible but not yet completely natural. Current research aims at providing higher quality and intelligibility to the synthetic speech that these systems produce. Finally, today's systems for speech and speaker recognition provide excellent performance on limited tasks; i.e., limited vocabulary, modest syntax, small talker populations, constrained inputs, etc.

  19. Silence, speech, and responsibility

    E-print Network

    Maitra, Ishani, 1974-

    2002-01-01

    Pornography deserves special protections, it is often said, because it qualifies as speech; therefore, no matter what we think of it, we must afford it the protections that we extend to most speech, but don't extend to ...

  20. Speech-Language Pathologists

    MedlinePLUS

    ... teachers , and special education teachers . <- Summary Work Environment -> Work Environment About this section Most speech-language pathologists ... others worked in healthcare facilities, such as hospitals. Work Schedules Most speech-language pathologists work full time. ...

  1. Speech and Language Impairments

    MedlinePLUS

    ... info A Day in the Life of an SLP Christina is a speech-language pathologist. She works ... certified speech-language pathologist such as Christina, the SLP in our opening story. Back to top Characteristics ...

  2. Context dependent speech recognition 

    E-print Network

    Andersson, Sebastian

    2006-01-01

    Poor speech recognition is a problem when developing spoken dialogue systems, but several studies has showed that speech recognition can be improved by post-processing of recognition output that use the dialogue context, ...

  3. CHAPTER 1. INTRODUCTION Speech coding or speech compression is one of the important aspects of speech

    E-print Network

    Beex, A. A. "Louis"

    1 CHAPTER 1. INTRODUCTION Speech coding or speech compression is one of the important aspects of speech communications nowadays. Some of the speech communication media that need speech coding are wireless communications and Internet telephony. By coding the speech, the speed to transmit the digitized

  4. SPEECH HANDICAPPED SCHOOL CHILDREN.

    ERIC Educational Resources Information Center

    JOHNSON, WENDELL; AND OTHERS

    THIS BOOK IS DESIGNED PRIMARILY FOR STUDENTS WHO ARE BEING TRAINED TO WORK WITH SPEECH HANDICAPPED SCHOOL CHILDREN, EITHER AS SPEECH CORRECTIONISTS OR AS CLASSROOM TEACHERS. THE BOOK DEALS WITH FOUR MAJOR QUESTIONS--(1) WHAT KINDS OF SPEECH DISORDERS ARE FOUND AMONG SCHOOL CHILDREN, (2) WHAT ARE THE PHYSICAL, PSYCHOLOGICAL AND SOCIAL CONDITIONS,…

  5. Free Speech Yearbook 1978.

    ERIC Educational Resources Information Center

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  6. Inner Speech Revisited.

    ERIC Educational Resources Information Center

    Wertsch, James V.

    This paper reviews some of the observations made by Vygotsky about the structure and content of inner speech and its precursor egocentric speech, also called private speech. Recent advances in certain areas of linguistics are used to clarify and develop these observations. In particular, the paper focuses on Vygotsky's ideas about the predicative…

  7. Speech and Language Delay

    MedlinePLUS

    MENU Return to Web version Speech and Language Delay Overview How do I know if my child has speech delay? Every child develops at his or her ... of the same age, the problem may be speech delay. Your doctor may think your child has ...

  8. Machine Translation from Speech

    NASA Astrophysics Data System (ADS)

    Schwartz, Richard; Olive, Joseph; McCary, John; Christianson, Caitlin

    This chapter describes approaches for translation from speech. Translation from speech presents two new issues. First, of course, we must recognize the speech in the source language. Although speech recognition has improved considerably over the last three decades, it is still far from being a solved problem. In the best of conditions, when the speech comes from high quality, carefully enunciated speech, on common topics (such as speech read by a trained news broadcaster), the word error rate is typically on the order of 5%. Humans can typically transcribe speech like this with less than 1% disagreement between annotators, so even this best number is still far worse than human performance. However, the task gets much harder when anything changes from this ideal condition. Some of the conditions that cause higher error rate are, if the topic is somewhat unusual, or the speakers are not reading so that their speech is more spontaneous, or if the speakers have an accent or are speaking a dialect, or if there is any acoustic degradation, such as noise or reverberation. In these cases, the word error can increase significantly to 20%, 30%, or higher. Accordingly, most of this chapter discusses techniques for improving speech recognition accuracy, while one section discusses techniques for integrating speech recognition with translation.

  9. EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE

    E-print Network

    EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE B.T. Lilly and K.K. Paliwal School as input to a recognition system. In this paper, the results of a study to examine the effects speech/s to 40 kbits/s are used with two different speech recognition systems 1) isolated word recogntion and 2

  10. Hearing or speech impairment - resources

    MedlinePLUS

    Resources - hearing or speech impairment ... The following organizations are good resources for information on hearing impairment or speech impairment: American Speech-Language-Hearing Association -- www.asha.org/public Center for Parent Information ...

  11. Speech-Language Therapy (For Parents)

    MedlinePLUS

    ... Specialists in Speech-Language Therapy Speech-language pathologists (SLPs), often informally known as speech therapists, are professionals ... from the American Speech-Language-Hearing Association (ASHA). SLPs assess speech, language, cognitive-communication, and oral/feeding/ ...

  12. Distributed processing for speech understanding

    SciTech Connect

    Bronson, E.C.; Siegel, L.

    1983-01-01

    Continuous speech understanding is a highly complex artificial intelligence task requiring extensive computation. This complexity precludes real-time speech understanding on a conventional serial computer. Distributed processing technique can be applied to the speech understanding task to improve processing speed. In the paper, the speech understanding task and several speech understanding systems are described. Parallel processing techniques are presented and a distributed processing architecture for speech understanding is outlined. 35 references.

  13. Advances in speech processing

    NASA Astrophysics Data System (ADS)

    Ince, A. Nejat

    1992-10-01

    The field of speech processing is undergoing a rapid growth in terms of both performance and applications and this is fueled by the advances being made in the areas of microelectronics, computation, and algorithm design. The use of voice for civil and military communications is discussed considering advantages and disadvantages including the effects of environmental factors such as acoustic and electrical noise and interference and propagation. The structure of the existing NATO communications network and the evolving Integrated Services Digital Network (ISDN) concept are briefly reviewed to show how they meet the present and future requirements. The paper then deals with the fundamental subject of speech coding and compression. Recent advances in techniques and algorithms for speech coding now permit high quality voice reproduction at remarkably low bit rates. The subject of speech synthesis is next treated where the principle objective is to produce natural quality synthetic speech from unrestricted text input. Speech recognition where the ultimate objective is to produce a machine which would understand conversational speech with unrestricted vocabulary, from essentially any talker, is discussed. Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. It is for this reason that the paper is concerned primarily with this technique.

  14. Chief Seattle's Speech Revisited

    ERIC Educational Resources Information Center

    Krupat, Arnold

    2011-01-01

    Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but apparently inevitable…

  15. Private Speech in Ballet

    ERIC Educational Resources Information Center

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  16. Tracking Speech Sound Acquisition

    ERIC Educational Resources Information Center

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  17. The Speech Communication Process.

    ERIC Educational Resources Information Center

    Clevenger, Theodore, Jr.; Matthews, Jack

    The book represents oral communication as a wide spectrum of speech events ranging from daily perfunctory comments to highly structured, formal speeches. The introductory chapter summarizes various approaches to the study of the communication process. The subsequent chapters present: (1) elements of communication language structure, language…

  18. Illustrated Speech Anatomy.

    ERIC Educational Resources Information Center

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  19. Improving Alaryngeal Speech Intelligibility.

    ERIC Educational Resources Information Center

    Christensen, John M.; Dwyer, Patricia E.

    1990-01-01

    Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

  20. .ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT

    E-print Network

    .ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT B. T. Lilly Brisbane, QLD 4111, Australia B.Ldy, K.Paliwal@me.gu.edu.au ABSTRACT Speech recognition systems work as a preprocessor for recognising speech in the presence of noise. It was found to improve the recognition

  1. Estimation of Severity of Speech Disability through Speech Envelope

    E-print Network

    Gudi, Anandthirtha B; Nagaraj, H C; 10.5121/sipij.2011.2203

    2011-01-01

    In this paper, envelope detection of speech is discussed to distinguish the pathological cases of speech disabled children. The speech signal samples of children of age between five to eight years are considered for the present study. These speech signals are digitized and are used to determine the speech envelope. The envelope is subjected to ratio mean analysis to estimate the disability. This analysis is conducted on ten speech signal samples which are related to both place of articulation and manner of articulation. Overall speech disability of a pathological subject is estimated based on the results of above analysis.

  2. DISTRIBUTED SPEECH RECOGNITION WIRA GUNAWAN

    E-print Network

    Hasegawa-Johnson, Mark

    warping) algorithm. The effect of quantization of speech recognition features on recognition accuracy-fiction movies. Speech recognition and speech synthesis are two main areas of voice processing that enable users to use voice as input instead of a keyboard. Some companies have started to use speech recognition

  3. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ..., Report and Order and Further Notice of Proposed Rulemaking, published at 77 FR 25609, May 1, 2012 (VRS... Nos. 03-123 and 08-15, Notice of Proposed Rulemaking, published at 73 FR 47120, August 13, 2008 (2008... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...

  4. Speech Acts and Conversational Interaction.

    ERIC Educational Resources Information Center

    Geis, Michael L.

    This book unites speech act theory and conversation analysis to advance a theory of conversational competence, called the Dynamic Speech Act Theory (DSAT). In contrast to traditional speech act theory that focuses almost exclusively on intuitive assessments of isolated, constructed examples, this theory is predicated on the assumption that speech

  5. Voice and Speech after Laryngectomy

    ERIC Educational Resources Information Center

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  6. Aging and Speech Understanding

    PubMed Central

    2015-01-01

    As people age, structural as well as neural degeneration occurs throughout the auditory system. Many older adults experience difficulty in understanding speech especially in adverse listening conditions although they can hear speech sounds. According to a report of the Committee on Hearing and Bioacoustics and Biomechanics of the National Research Council, peripheral, central-auditory, and cognitive systems have long been considered major factors affecting the understanding of speech. The present study aims to review 1) age-related changes in the peripheral, central-auditory, and cognitive systems, 2) the resulting decline in the understanding of speech, and 3) the clinical implication for audiologic rehabilitation of older adults. Once the factors affecting the understanding of speech in older adults are identified and the characteristics of age-related speech understanding difficulties are examined, clinical management could be developed for prevention and treatment. Future research about problems related to the understanding of speech in older adults will help to improve the quality of life in the elderly. PMID:26185785

  7. Robust Speech Recognition Under Noisy Ambient

    E-print Network

    CHAPTER Robust Speech Recognition Under Noisy Ambient Conditions 6Kuldip K. Paliwal School ............................................................................................................. 136 6.2 Speech Recognition Overview ............................................................................... 141 6.4 Robust Speech Recognition Techniques

  8. Portable Speech Synthesizer

    NASA Technical Reports Server (NTRS)

    Leibfritz, Gilbert H.; Larson, Howard K.

    1987-01-01

    Compact speech synthesizer useful traveling companion to speech-handicapped. User simply enters statement on board, and synthesizer converts statement into spoken words. Battery-powered and housed in briefcase, easily carried on trips. Unit used on telephones and face-to-face communication. Synthesizer consists of micro-computer with memory-expansion module, speech-synthesizer circuit, batteries, recharger, dc-to-dc converter, and telephone amplifier. Components, commercially available, fit neatly in 17-by 13-by 5-in. briefcase. Weighs about 20 lb (9 kg) and operates and recharges from ac receptable.

  9. Speech and Communication Disorders

    MedlinePLUS

    Many disorders can affect our ability to speak and communicate. They range from saying sounds incorrectly to being completely ... to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, such as dysphonia or ...

  10. Detecting Speech Defects

    ERIC Educational Resources Information Center

    Kryza, Frank T., II

    1976-01-01

    Discusses the importance of early detection of speech defects and briefly describes the activities of the Pre-School Diagnostic Center for Severe Communication Disorders in New Haven, Connecticut. (ED)

  11. Sequence Learning & Speech Recognition

    E-print Network

    Keysers, Daniel

    Jelinek: Statistical Methods for Speech Recognition, MIT Press, 1998 Lawrence R. Rabiner: A Tutorial (i.e. visual surveillance) bioinformatics: sequence alignment, gene finding, protein with Bioinformatics Applications #12;Hidden Markov Models Three Basic Problems Evaluation Problem: given observation

  12. Improving statistical speech recognition 

    E-print Network

    Renals, Steve; Morgan, Nelson; Cohen, Michael; Franco, Horacio; Bourlard, Herve

    A summary of the theory of the hybrid connectionist HMM (hidden Markov model) continuous speech recognition system is presented. Experimental results indicating that the connectionist methods can significantly improve the performance of a context...

  13. Auditory speech preprocessors

    SciTech Connect

    Zweig, G.

    1989-01-01

    A nonlinear transmission line model of the cochlea (Zweig 1988) is proposed as the basis for a novel speech preprocessor. Sounds of different intensities, such as voiced and unvoiced speech, are preprocessed in radically different ways. The Q's of the preprocessor's nonlinear filters vary with input amplitude, higher Q's (longer integration times) corresponding to quieter sounds. Like the cochlea, the preprocessor acts as a ''subthreshold laser'' that traps and amplifies low level signals, thereby aiding in their detection and analysis. 17 refs.

  14. Computer-generated speech

    SciTech Connect

    Aimthikul, Y.

    1981-12-01

    This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set of sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.

  15. Speech perception and production

    PubMed Central

    Casserly, Elizabeth D.; Pisoni, David B.

    2012-01-01

    Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. PMID:23946864

  16. Join Cost for Unit Selection Speech Synthesis 

    E-print Network

    Vepa, Jithendra

    Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied ...

  17. Speech processing using maximum likelihood continuity mapping

    SciTech Connect

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  18. Speech processing using maximum likelihood continuity mapping

    SciTech Connect

    Hogden, John E.

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  19. Objective speech quality evaluation of real-time speech coders

    NASA Astrophysics Data System (ADS)

    Viswanathan, V. R.; Russell, W. H.; Huggins, A. W. F.

    1984-02-01

    This report describes the work performed in two areas: subjective testing of a real-time 16 kbit/s adaptive predictive coder (APC) and objective speech quality evaluation of real-time coders. The speech intelligibility of the APC coder was tested using the Diagnostic Rhyme Test (DRT), and the speech quality was tested using the Diagnostic Acceptability Measure (DAM) test, under eight operating conditions involving channel error, acoustic background noise, and tandem link with two other coders. The test results showed that the DRT and DAM scores of the APC coder equalled or exceeded the corresponding test scores fo the 32 kbit/s CVSD coder. In the area of objective speech quality evaluation, the report describes the development, testing, and validation of a procedure for automatically computing several objective speech quality measures, given only the tape-recordings of the input speech and the corresponding output speech of a real-time speech coder.

  20. Utilising Spontaneous Conversational Speech in HMM-Based Speech Synthesis 

    E-print Network

    Andersson, Sebastian; Yamagishi, Junichi; Clark, Robert

    2010-01-01

    Spontaneous conversational speech has many characteristics that are currently not well modelled in unit selection and HMM-based speech synthesis. But in order to build synthetic voices more suitable for interaction we need data that exhibits more...

  1. Cochlear implant speech recognition with speech maskersa) Ginger S. Stickneyb)

    E-print Network

    Litovsky, Ruth

    Cochlear implant speech recognition with speech maskersa) Ginger S. Stickneyb) and Fan-Gang Zeng; accepted 16 May 2004 Speech recognition performance was measured in normal-hearing and cochlear-implant processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli

  2. Why Go to Speech Therapy?

    MedlinePLUS

    ... Teachers Speech-Language Pathologists Physicians Employers Tweet Why Go To Speech Therapy? Parents of Preschoolers Parents of ... types of therapy work best when you can go on an intensive schedule (i.e., every day ...

  3. Development of a speech autocuer

    NASA Technical Reports Server (NTRS)

    Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.

    1980-01-01

    A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.

  4. Large Scale Speech Synthesis Evaluation 

    E-print Network

    Podsiadlo, Monika

    2007-11-11

    In speech synthesis evaluation, it is critical that we know what exactly affects the results of the evaluation rather than employing as vague notions as, say, "good quality speech". As so far we have only been able to ...

  5. Speech Alarms Pilot Study

    NASA Technical Reports Server (NTRS)

    Sandor, A.; Moses, H. R.

    2016-01-01

    Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were asked to identify the alert as quickly and as accurately as possible. Reaction time and accuracy were measured. Participants identified speech alerts significantly faster than tone alerts. The HERA study investigated the performance of participants in a flight-like environment. Participants were instructed to complete items on a task list and respond to C&W alerts as they occurred. Reaction time and accuracy were measured to determine if the benefits of speech alarms are still present in an applied setting.

  6. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ...In this document, the Commission amends telecommunications relay services (TRS) mandatory minimum standards applicable to Speech- to-Speech (STS) relay service. This action is necessary to ensure that persons with speech disabilities have access to relay services that address their unique needs, in furtherance of the objectives of section 225 of the Communications Act of 1934, as amended (the......

  7. SPEECH-LANGUAGE-HEARING CLINIC

    E-print Network

    SPEECH-LANGUAGE- HEARING CLINIC AT OSU-TULSA The OSU-Tulsa Speech-Language-Hearing Clinic provides's faculty, offer assessment and therapy services for a variety of speech, language and hearing disorders and phonology · Voice · Hearing loss · Receptive and expressive language · Resonance · Aphasia · Reading

  8. Abortion and compelled physician speech.

    PubMed

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. PMID:25846035

  9. Signed Soliloquy: Visible Private Speech

    ERIC Educational Resources Information Center

    Zimmermann, Kathrin; Brugger, Peter

    2013-01-01

    Talking to oneself can be silent (inner speech) or vocalized for others to hear (private speech, or soliloquy). We investigated these two types of self-communication in 28 deaf signers and 28 hearing adults. With a questionnaire specifically developed for this study, we established the visible analog of vocalized private speech in deaf signers.…

  10. Robust Speech Recognition Program Summary

    E-print Network

    Robust Speech Recognition Program Summary Clifford J. Weinstein MIT Lincoln Laboratory The Lincoln Laboratory Program in Robust Speech Recognition Technology was initiated in FY85 with the major goal of developing techniques for high-performance speech recognition under the stress and noise conditions typical

  11. Speech spectrogram expert

    SciTech Connect

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  12. Microphones for speech and speech recognition

    NASA Astrophysics Data System (ADS)

    West, James E.

    2004-10-01

    Automatic speech recognition (ASR) requires about a 15- to 20-dB signal-to-noise ratio (S/N) for high accuracy even for small vocabulary systems. This S/N is generally achievable using a telephone handset in normal office or home environments. In the early 1990s ATT and the regional telephone companies began using speaker-independent ASR to replace several operator services. The variable distortion in the carbon microphone was not transparent and resulted in reduced ASR accuracy. The linear electret condenser microphone, common in most modern telephones, improved handset performance both in sound quality and ASR accuracy. Hands-free ASR in quiet conditions is a bit more complex because of the increased distance between the microphone and the speech source. Cardioid directional microphones offer some improvement in noisy locations when the noise and desired signals are spatially separated, but this is not the general case and the resulting S/N is not adequate for seamless machine translation. Higher-order directional microphones, when properly oriented with respect to the talker and noise, have shown good improvement over omni-directional microphones. Some ASR results measured in simulated car noise will be presented.

  13. Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation for Speech

    E-print Network

    de Gispert, Adrià

    Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation Department Young Speech Researchers Meeting 2007, UCL 1 #12;Bayesian Discriminative Adaptation for Speech Recognition Overview · Adaptation and Adaptive Training ­ Speech Recognition in Varying Acoustic Conditions

  14. LEVERAGING AUTOMATIC SPEECH RECOGNITION IN COCHLEAR IMPLANTS FOR IMPROVED SPEECH INTELLIGIBILITY UNDER REVERBERATION

    E-print Network

    Texas at Dallas, University of

    LEVERAGING AUTOMATIC SPEECH RECOGNITION IN COCHLEAR IMPLANTS FOR IMPROVED SPEECH INTELLIGIBILITY technology for cochlear implant (CI) devices, there still remains a significant gap between speech in reverberant envi- ronments. Index Terms-- Automatic speech recognition, cochlear implants, multi

  15. Packet speech systems technology

    NASA Astrophysics Data System (ADS)

    Weinstein, C. J.; Blankenship, P. E.

    1982-09-01

    The long-range objectives of the Packet Speech Systems Technology Program are to develop and demonstrate techniques for efficient digital speech communications on networks suitable for both voice and data, and to investigate and develop techniques for integrated voice and data communication in packetized networks, including wideband common-user satellite links. Specific areas of concern are: the concentration of statistically fluctuating volumes of voice traffic, the adaptation of communication strategies to varying conditions of network links and traffic volume, and the interconnection of wideband satellite networks to terrestrial systems. Previous efforts in this area have led to new vocoder structures for improved narrowband voice performance and multiple-rate transmission, and to demonstrations of conversational speech and conferencing on the ARPANET and the Atlantic Packet Satellite Network. The current program has two major thrusts: i.e., the development and refinement of practical low-cost, robust, narrowband, and variable-rate speech algorithms and voice terminal structures; and the establishment of an experimental wideband satellite network to serve as a unique facility for the realistic investigation of voice/data networking strategies.

  16. From the Speech Files

    ERIC Educational Resources Information Center

    Can Vocat J, 1970

    1970-01-01

    In a speech, Looking Ahead in Vocational Education", to a group of Hamilton educators, D.O. Davis, Vice-President, Engineering, Dominion Foundries and Steel Limited, Hamilton, Ontario spoke of the challenge of change and what educators and industry must do to help the future of vocational education. (Editor)

  17. Cued Speech. PEPNet Tipsheet

    ERIC Educational Resources Information Center

    Cappiello, Samuel, Comp.; Quenin, Catherine, Comp.

    2003-01-01

    Cued Speech (CS) is a tool used to make spoken languages visible. While it uses the hands to communicate information visually, it is not a form of sign language. Signed languages are languages in their own right and use the hands, body, and face to present complete concepts rather than words. They have their own grammar systems and vocabularies.…

  18. Mandarin Visual Speech Information

    ERIC Educational Resources Information Center

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  19. Black History Speech

    ERIC Educational Resources Information Center

    Noldon, Carl

    2007-01-01

    The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

  20. Perceptual Learning in Speech

    ERIC Educational Resources Information Center

    Norris, Dennis; McQueen, James M.; Cutler, Anne

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g.,…

  1. Interlocutor Informative Speech

    ERIC Educational Resources Information Center

    Gray, Jonathan M.

    2005-01-01

    Sharing information orally is an important skill that public speaking classes teach well. However, the author's students report that they do not often see informative lectures, demonstrations, presentations, or discussions that follow the structures and formats of an informative speech as it is discussed in their textbooks. As a result, the author…

  2. Speech to schoolchildren

    NASA Astrophysics Data System (ADS)

    Angell, C. Austen

    2013-02-01

    Prof. C. A. Angell from Arizona State University read the following short and simple speech, saying the sentences in Italics in the best Japanese he could manage (after earnest coaching from a Japanese colleague). The rest was translated on the bus ride, and then spoken, as I spoke, by Ms. Yukako Endo- to whom the author is very grateful.

  3. Microprocessor for speech recognition

    SciTech Connect

    Ishizuka, H.; Watari, M.; Sakoe, H.; Chiba, S.; Iwata, T.; Matsuki, T.; Kawakami, Y.

    1983-01-01

    A new single-chip microprocessor for speech recognition has been developed utilizing multi-processor architecture and pipelined structure. By DP-matching algorithm, the processor recognizes up to 340 isolated words or 40 connected words in realtime. 6 references.

  4. Expectations and speech intelligibility.

    PubMed

    Babel, Molly; Russell, Jamie

    2015-05-01

    Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs. PMID:25994710

  5. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2002-01-01

    Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

  6. Neurophysiology of Speech Differences in Childhood Apraxia of Speech

    PubMed Central

    Preston, Jonathan L.; Molfese, Peter J.; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

  7. Speech audiometry by a speech synthesizer. I. A preliminary report.

    PubMed

    Rahko, T; Karjalainen, M A; Laine, U K; Lavonen, S

    1979-01-01

    A preliminary report on speech test results with a portable, text-to-speech synthesizer is presented. The differentiation scores achieved at speed 80 words/min vary. So far the best mean differentiation scores in normal material are 75%. The increase of the presentation level improves the differentiation score, as does the decrease of word speed and training. The future and present uses of this system are discussed. These include: devices for the handicapped, e.g. to produce speech for the mute, man-machine communication through speech in industry control, data processing systems and uses in audiological diagnostics. The study is continued. PMID:435169

  8. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  9. Huntington's Disease: Speech, Language and Swallowing

    MedlinePLUS

    ... the course of the disease. What do speech-language pathologists do when working with people with Huntington's ... of Neurological Disorders and Stroke Typical Speech and Language Development Learning More Than One Language Adult Speech ...

  10. SSML: A speech synthesis markup language. 

    E-print Network

    Taylor, Paul A; Isard, Amy

    1997-01-01

    This paper describes the Speech Synthesis Markup Language, SSML, which has been designed as a platform independent interface standard for speech synthesis systems. The paper discusses the need for standardisation in speech ...

  11. Real-time speech animation system

    E-print Network

    Fu, Jieyun

    2011-01-01

    We optimize the synthesis procedure of a videorealistic speech animation system [7] to achieve real-time speech animation synthesis. A synthesis rate must be high enough for real-time video streaming for speech animation ...

  12. Speech Recognition: How Do We Teach It?

    ERIC Educational Resources Information Center

    Barksdale, Karl

    2002-01-01

    States that growing use of speech recognition software has made voice writing an essential computer skill. Describes how to present the topic, develop basic speech recognition skills, and teach speech recognition outlining, writing, proofreading, and editing. (Contains 14 references.) (SK)

  13. An Articulatory Speech-Prosthesis System

    E-print Network

    Wee, Keng Hoong

    We investigate speech-coding strategies for brain-machine-interface (BMI) based speech prostheses. We present an articulatory speech-synthesis system using an experimental integrated-circuit vocal tract that models the ...

  14. Predicting confusions and intelligibility of noisy speech

    E-print Network

    Messing, David P. (David Patrick), 1979-

    2007-01-01

    Current predictors of speech intelligibility are inadequate for making predictions of speech confusions caused by acoustic interference. This thesis is inspired by the need for a capability to understand and predict speech ...

  15. Large Vocabulary, Multilingual Speech Recognition: Session Overview

    E-print Network

    Large Vocabulary, Multilingual Speech Recognition: Session Overview Lori LAMEL, Yoshinori SAGISAKA developed for a given language provide cruical input to speech recognition technology world-wide. However associate knowledge on speaker-independent, large vocabulary, continuous speech recognition technology among

  16. Headphone localization of speech

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1993-01-01

    Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  17. Applications for Subvocal Speech

    NASA Technical Reports Server (NTRS)

    Jorgensen, Charles; Betts, Bradley

    2007-01-01

    A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

  18. Speech rhythm: a metaphor?

    PubMed Central

    Nolan, Francis; Jeon, Hae-Sung

    2014-01-01

    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence of alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and it is this analogical process which allows speech to be matched to external rhythms. PMID:25385774

  19. Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity

    ERIC Educational Resources Information Center

    Opt, Susan

    2012-01-01

    In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

  20. Speech-in-Speech Recognition: A Training Study

    ERIC Educational Resources Information Center

    Van Engen, Kristin J.

    2012-01-01

    This study aims to identify aspects of speech-in-noise recognition that are susceptible to training, focusing on whether listeners can learn to adapt to target talkers ("tune in") and learn to better cope with various maskers ("tune out") after short-term training. Listeners received training on English sentence recognition in speech-shaped noise…

  1. A speech locked loop for cochlear implants and speech prostheses

    E-print Network

    Wee, Keng Hoong

    We have previously described a feedback loop that combines an auditory processor with a low-power analog integrated-circuit vocal tract to create a speech-locked-loop. Here, we describe how the speech-locked loop can help ...

  2. Interpersonal Orientation and Speech Behavior.

    ERIC Educational Resources Information Center

    Street, Richard L., Jr.; Murphy, Thomas L.

    1987-01-01

    Indicates that (1) males with low interpersonal orientation (IO) were least vocally active and expressive and least consistent in their speech performances, and (2) high IO males and low IO females tended to demonstrate greater speech convergence than either low IO males or high IO females. (JD)

  3. [Optimising speech during artificial ventilation].

    PubMed

    Gonzalez-Bermejo, J; Prigent, H

    2005-09-01

    Speech is an essential component of quality of life for patients treated with long term mechanical ventilation. Therefore trying to improve phonation should always sought by physicians treating these patients. We review the different tools and techniques available in order to restore speech for patients with home mechanical ventilation whether ventilation is done continuously or not. PMID:16294188

  4. Child Characteristics and Maternal Speech.

    ERIC Educational Resources Information Center

    Smolak, Linda

    1987-01-01

    An eight-month longitudinal study measuring infants' (N=8) temperament characteristics of activity level, task persistence, and affect and discourse and pragmatic features of their mothers' speech revealed complex interactions between maternal speech and infant temperament. It is argued that nonlinguistic child behaviors may influence maternal…

  5. SILENT SPEECH DURING SILENT READING.

    ERIC Educational Resources Information Center

    MCGUIGAN, FRANK J.

    EFFORTS WERE MADE IN THIS STUDY TO (1) RELATE THE AMOUNT OF SILENT SPEECH DURING SILENT READING TO LEVEL OF READING PROFICIENCY, INTELLIGENCE, AGE, AND GRADE PLACEMENT OF SUBJECTS, AND (2) DETERMINE WHETHER THE AMOUNT OF SILENT SPEECH DURING SILENT READING IS AFFECTED BY THE LEVEL OF DIFFICULTY OF PROSE READ AND BY THE READING OF A FOREIGN…

  6. Audiovisual Speech Recalibration in Children

    ERIC Educational Resources Information Center

    van Linden, Sabine; Vroomen, Jean

    2008-01-01

    In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…

  7. Speech Prosody in Cerebellar Ataxia

    ERIC Educational Resources Information Center

    Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

    2007-01-01

    Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

  8. Methods of Teaching Speech Recognition

    ERIC Educational Resources Information Center

    Rader, Martha H.; Bailey, Glenn A.

    2010-01-01

    Objective: This article introduces the history and development of speech recognition, addresses its role in the business curriculum, outlines related national and state standards, describes instructional strategies, and discusses the assessment of student achievement in speech recognition classes. Methods: Research methods included a synthesis of…

  9. Automatic Speech Recognition

    NASA Astrophysics Data System (ADS)

    Potamianos, Gerasimos; Lamel, Lori; Wölfel, Matthias; Huang, Jing; Marcheret, Etienne; Barras, Claude; Zhu, Xuan; McDonough, John; Hernando, Javier; Macho, Dusan; Nadeu, Climent

    Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones). However, close-talking microphones are also of interest, as they are used to benchmark ASR system development by providing a best-case acoustic channel scenario to compare against.

  10. Speech Anxiety: The Importance of Identification in the Basic Speech Course.

    ERIC Educational Resources Information Center

    Mandeville, Mary Y.

    A study investigated speech anxiety in the basic speech course by means of pre and post essays. Subjects, 73 students in 3 classes in the basic speech course at a southwestern multiuniversity, wrote a two-page essay on their perceptions of their speech anxiety before the first speaking project. Students discussed speech anxiety in class and were…

  11. Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

    ERIC Educational Resources Information Center

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2006-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech

  12. Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise

    E-print Network

    Edinburgh, University of

    Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King The Centre for Speech Technology Research, University It is possible to increase the intelligibility of speech in noise by enhancing the clean speech signal

  13. SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION

    E-print Network

    SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION J. Chen* , K. K the performance of speech recognition systems. For many speech recognition applications the most important source of acoustical distortion is the additive noise. Much research effort in robust speech recognition has been

  14. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    PubMed

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues. PMID:25794478

  15. Preschool Children's Awareness of Private Speech

    ERIC Educational Resources Information Center

    Manfra, Louis; Winsler, Adam

    2006-01-01

    The present study explored: (a) preschool children's awareness of their own talking and private speech (speech directed to the self); (b) differences in age, speech use, language ability, and mentalizing abilities between children with awareness and those without; and (c) children's beliefs and attitudes about private speech. Fifty-one children…

  16. Speech Writing and Improving Public Speaking Skills.

    ERIC Educational Resources Information Center

    Haven, Richard P.

    A course in speech writing (preparing speeches for delivery by another person) is critical to the development of public speaking skills for college students. Unlike the traditional public speaking course, speech writing classes emphasize the preparation of the content of a speech over the delivery of the message. Students develop the ability to…

  17. Instructional Improvement Speech Handbook. Secondary Level.

    ERIC Educational Resources Information Center

    Crapse, Larry

    Recognizing that speech is an important component of the language arts and that the English curriculum is the most natural place for speech skills to be fostered, this handbook examines several methods of developing speech competencies within the secondary school English classroom. The first section, "Looking at Speech," examines the nature of…

  18. Audio-Visual Speech Perception Is Special

    ERIC Educational Resources Information Center

    Tuomainen, J.; Andersen, T.S.; Tiippana, K.; Sams, M.

    2005-01-01

    In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio-visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and…

  19. Automated Speech Rate Measurement in Dysarthria

    ERIC Educational Resources Information Center

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  20. Phonetic Recalibration Only Occurs in Speech Mode

    ERIC Educational Resources Information Center

    Vroomen, Jean; Baart, Martijn

    2009-01-01

    Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

  1. Freedom of Speech Newsletter, February 1976.

    ERIC Educational Resources Information Center

    Allen, Winfred G., Jr., Ed.

    The "Freedom of Speech Newsletter" is the communication medium, published four times each academic year, of the Freedom of Speech Interest Group, Western Speech Communication Association. Articles included in this issue are "What Is Academic Freedom For?" by Ralph Ross, "A Sociology of Free Speech" by Ray Heidt, "A Queer Interpretation fo the…

  2. Emerging Technologies Speech Tools and Technologies

    ERIC Educational Resources Information Center

    Godwin-Jones, Robert

    2009-01-01

    Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

  3. Multifractal nature of unvoiced speech signals

    SciTech Connect

    Adeyemi, O.A.; Hartt, K.; Boudreaux-Bartels, G.F.

    1996-06-01

    A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures. {copyright} {ital 1996 American Institute of Physics.}

  4. POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes

    E-print Network

    Bilmes, Jeff

    POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes {hlin,bilmes}@ee.washington.edu Department for speech recognition that consists of multiple semi-synchronized recognizers operating on a polyphase problem in many speech recognition systems ­ i.e., that speech modulation energy is most important below

  5. ROBUST SPEECH RECOGNITION K.K. Paliwal

    E-print Network

    ROBUST SPEECH RECOGNITION K.K. Paliwal School of Microelectronic Engineering Griffith University. The aim of ro- bust speech recognition is to overcome the mismatch problem so as to result in a moderate of an automatic speech recognition system, describe sources of speech variability that cause mismatch between

  6. Infant Perception of Atypical Speech Signals

    ERIC Educational Resources Information Center

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  7. Alignment of speech and co-speech gesture in a constraint-based grammar 

    E-print Network

    Saint-Amand, Katya; Amand, Katya Saint; Alahverdzhieva, Katya

    2013-07-02

    This thesis concerns the form-meaning mapping of multimodal communicative actions consisting of speech signals and improvised co-speech gestures, produced spontaneously with the hand. The interaction between speech and ...

  8. Personalising speech-to-speech translation in the EMIME project 

    E-print Network

    Kurimo, Mikko; Byrne, William; Dines, John; Garner, Philip N.; Gibson, Matthew; Guan, Yong; Hirsimaki, Teemu; Karhila, Reima; King, Simon; Liang, Hui; Oura, Keiichiro; Saheer, Lakshmi; Shannon, Matt; Shiota, Sayaka; Tian, Jilei; Tokuda, Keiichi; Wester, Mirjam; Wu, Yi-Jian; Yamagishi, Junichi

    2010-01-01

    In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt ...

  9. Speech recovery device

    DOEpatents

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  10. Speech recovery device

    SciTech Connect

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  11. Letter-based speech synthesis 

    E-print Network

    Watts, Oliver; Yamagishi, Junichi; King, Simon

    2010-01-01

    Initial attempts at performing text-to-speech conversion based on standard orthographic units are presented, forming part of a larger scheme of training TTS systems on features that can be trivially extracted from text. We evaluate the possibility...

  12. Is Private Speech Really Private? 

    E-print Network

    Smith, Ashley

    2011-01-01

    This study sought to answer the question “is private speech really private?” by assessing if participants spoke more to themselves when in the company of the experimenter or when they were alone. The similarity between ...

  13. Contingent categorization in speech perception

    PubMed Central

    Bullock-Rest, Natasha; Rhone, Ariane E.; Jongman, Allard; McMurray, Bob

    2013-01-01

    The speech signal is notoriously variable, with the same phoneme realized differently depending on factors like talker and phonetic context. Variance in the speech signal has led to a proliferation of theories of how listeners recognize speech. A promising approach, supported by computational modeling studies, is contingent categorization, wherein incoming acoustic cues are computed relative to expectations. We tested contingent encoding empirically. Listeners were asked to categorize fricatives in CV syllables constructed by splicing the fricative from one CV syllable with the vowel from another CV syllable. The two spliced syllables always contained the same fricative, providing consistent bottom-up cues; however on some trials, the vowel and/or talker mismatched between these syllables, giving conflicting contextual information. Listeners were less accurate and slower at identifying the fricatives in mismatching splices. This suggests that listeners rely on context information beyond bottom-up acoustic cues during speech perception, providing support for contingent categorization. PMID:25157376

  14. Delayed Speech or Language Development

    MedlinePLUS

    ... often around 9 months), they begin to string sounds together, incorporate the different tones of speech, and ... of age, babies also should be attentive to sound and begin to recognize names of common objects ( ...

  15. Autoregressive HMMs for speech synthesis

    E-print Network

    Shannon, Matt; Byrne, William

    2009-09-07

    We propose the autoregressive HMM for speech synthesis. We show that the autoregressive HMM supports efficient EM parameter estimation and that we can use established effective synthesis techniques such as synthesis considering global variance...

  16. Speech processing: An evolving technology

    SciTech Connect

    Crochiere, R.E.; Flanagan, J.L.

    1986-09-01

    As we enter the information age, speech processing is emerging as an important technology for making machines easier and more convenient for humans to use. It is both an old and a new technology - dating back to the invention of the telephone and forward, at least in aspirations, to the capabilities of HAL in 2001. Explosive advances in microelectronics now make it possible to implement economical real-time hardware for sophisticated speech processing - processing that formerly could be demonstrated only in simulations on main-frame computers. As a result, fundamentally new product concepts - as well as new features and functions in existing products - are becoming possible and are being explored in the marketplace. As the introductory piece to this issue, the authors draw a brief perspective on the evolving field of speech processing and assess the technology in the the three constituent sectors: speech coding, synthesis, and recognition.

  17. Perceptual Learning of Interrupted Speech

    PubMed Central

    Benard, Michel Ruben; Ba?kent, Deniz

    2013-01-01

    The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group), further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations. PMID:23469266

  18. Neural bases of accented speech perception

    PubMed Central

    Adank, Patti; Nuttall, Helen E.; Banks, Briony; Kennedy-Higgins, Daniel

    2015-01-01

    The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012) for an in-depth overview of behavioral aspects of accent processing. PMID:26500526

  19. Impaired motor speech performance in Huntington's disease.

    PubMed

    Skodda, Sabine; Schlegel, Uwe; Hoffmann, Rainer; Saft, Carsten

    2014-04-01

    Dysarthria is a common symptom of Huntington's disease and has been reported, besides other features, to be characterized by alterations of speech rate and regularity. However, data on the specific pattern of motor speech impairment and their relationship to other motor and neuropsychological symptoms are sparse. Therefore, the aim of the present study was to describe and objectively analyse different speech parameters with special emphasis on the aspect of speech timing of connected speech and non-speech verbal utterances. 21 patients with manifest Huntington's disease and 21 age- and gender-matched healthy controls had to perform a reading task and several syllable repetition tasks. Computerized acoustic analysis of different variables for the measurement of speech rate and regularity generated a typical pattern of impaired motor speech performance with a reduction of speech rate, an increase of pauses and a marked disability to steadily repeat single syllables. Abnormalities of speech parameters were more pronounced in the subgroup of patients with Huntington's disease receiving antidopaminergic medication, but were also present in the drug-naïve patients. Speech rate related to connected speech and parameters of syllable repetition showed correlations to overall motor impairment, capacity of tapping in a quantitative motor assessment and some score of cognitive function. After these preliminary data, further investigations on patients in different stages of disease are warranted to survey if the analysis of speech and non-speech verbal utterances might be a helpful additional tool for the monitoring of functional disability in Huntington's disease. PMID:24221215

  20. The interlanguage speech intelligibility benefit

    NASA Astrophysics Data System (ADS)

    Bent, Tessa; Bradlow, Ann R.

    2003-09-01

    This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n=2), Korean (n=2), and English (n=1) were recorded reading simple English sentences. Native listeners of English (n=21), Chinese (n=21), Korean (n=10), and a mixed group from various native language backgrounds (n=12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the ``matched interlanguage speech intelligibility benefit.'' Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the ``mismatched interlanguage speech intelligibility benefit.'' These findings shed light on the nature of the talker-listener interaction during speech communication.

  1. Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1

    E-print Network

    Cortes, Corinna

    Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1 , Kyle Gorman1 recognition, dialec- tal speech recognition, pronunciation modeling, discriminative training 1. Introduction Speech recognition technology is increasingly ubiquitous in ev- eryday life. Automatic speech recognition

  2. Neural pathways for visual speech perception

    PubMed Central

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  3. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2001-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  4. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  5. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  6. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    DOEpatents

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  7. Current methods of digital speech processing

    NASA Astrophysics Data System (ADS)

    Rabiner, Lawrence R.; Atal, B. S.; Flanagan, J. L.

    1990-05-01

    The field of digital speech processing includes the areas of speech coding, speech synthesis, and speech recognition. With the advent of faster computation and high speed VLSI circuits, speech processing algorithms are becoming more sophisticated, more robust, and more reliable. As a result, significant advances have been made in coding, synthesis, and recognition, but, in each area, there still remain great challenges in harnessing speech technology to human needs. In the area of speech coding, current algorithms perform well at bit rates down to 16 kbits/sec. Current research is directed at further reducing the coding rate for high-quality speech into the data speed range, even as low as 2.4 kbits/sec. In text-to-speech synthesis speech is produced which is very intelligible but is not yet completely natural. Current research aims at providing higher quality and intelligibility to the synthesis speech produced by these systems. Finally, in the area of speech and speaker recognition, present systems provide excellent performance on limited tasks; i.e., limited vocabulary, modest syntax, small talker populations, constrained inputs, and favorable signal-to-noise ratios. Current research is directed at solving the problem of continuous speech recognition for large vocabularies, and at verifying talker's identities from a limited amount of spoken text.

  8. Speech prosody in cerebellar ataxia

    NASA Astrophysics Data System (ADS)

    Casper, Maureen

    The present study sought an acoustic signature for the speech disturbance recognized in cerebellar degeneration. Magnetic resonance imaging was used for a radiological rating of cerebellar involvement in six cerebellar ataxic dysarthric speakers. Acoustic measures of the [pap] syllables in contrastive prosodic conditions and of normal vs. brain-damaged patients were used to further our understanding both of the speech degeneration that accompanies cerebellar pathology and of speech motor control and movement in general. Pair-wise comparisons of the prosodic conditions within the normal group showed statistically significant differences for four prosodic contrasts. For three of the four contrasts analyzed, the normal speakers showed both longer durations and higher formant and fundamental frequency values in the more prominent first condition of the contrast. The acoustic measures of the normal prosodic contrast values were then used as a model to measure the degree of speech deterioration for individual cerebellar subjects. This estimate of speech deterioration as determined by individual differences between cerebellar and normal subjects' acoustic values of the four prosodic contrasts was used in correlation analyses with MRI ratings. Moderate correlations between speech deterioration and cerebellar atrophy were found in the measures of syllable duration and f0. A strong negative correlation was found for F1. Moreover, the normal model presented by these acoustic data allows for a description of the flexibility of task- oriented behavior in normal speech motor control. These data challenge spatio-temporal theory which explains movement as an artifact of time wherein longer durations predict more extreme movements and give further evidence for gestural internal dynamics of movement in which time emerges from articulatory events rather than dictating those events. This model provides a sensitive index of cerebellar pathology with quantitative acoustic analyses.

  9. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  11. SHORT-TIME KURTOSIS OF SPEECH SIGNALS WITH APPLICATION TO CO-CHANNEL SPEECH SEPARATION

    E-print Network

    De Leon, Phillip

    SHORT-TIME KURTOSIS OF SPEECH SIGNALS WITH APPLICATION TO CO-CHANNEL SPEECH SEPARATION Phillip L, New Mexico 88003-8001 pdeleon@nmsu.edu ABSTRACT Recent work into the separation of mixtures of speech, a simple gradient search algorithm is em- ployed to maximize kurtosis thereby separating the source speech

  12. AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION

    E-print Network

    Alwan, Abeer

    ×× AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION Qifeng In this paper, a novel algorithm that resembles amplitude demodulation in the frequency domain is introduced, and its application to automatic speech recognition (ASR) is studied. Speech production can be regarded

  13. Can Speech Recognizers Measure the Effectiveness of Encoding Algorithms for Digital Speech Transmission?

    E-print Network

    Mills, Kevin

    Can Speech Recognizers Measure the Effectiveness of Encoding Algorithms for Digital Speech speech recognition technology to model human perception of the quality of communication channels) without encoding, (2) with encoding and decoding using a standard algorithm for speech compression, and (3

  14. USING NEUTRAL SPEECH MODELS FOR EMOTIONAL SPEECH ANALYSIS Carlos Busso, Sungbok Lee and Shrikanth S. Narayanan

    E-print Network

    Busso, Carlos

    USING NEUTRAL SPEECH MODELS FOR EMOTIONAL SPEECH ANALYSIS Carlos Busso, Sungbok Lee and Shrikanth S are different variants of the neutral emotion. ·Emotion expression affects different speech sounds differently. Idea ·Discriminate between emotional and neutral speech. ·Acoustic neutral reference models are used

  15. Using Neutral Speech Models for Emotional Speech Analysis Carlos Busso1

    E-print Network

    Busso, Carlos

    Using Neutral Speech Models for Emotional Speech Analysis Carlos Busso1 , Sungbok Lee1 on neutral (non-emotional) speech, it is expected that a robust neu- tral speech model can be useful models trained with spectral features, using the emotionally-neutral TIMIT corpus. The performance

  16. The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech

    ERIC Educational Resources Information Center

    Wayne, Rachel V.; Johnsrude, Ingrid S.

    2012-01-01

    Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

  17. The Relationship between Speech Perception and Auditory Organisation: Studies with Spectrally Reduced Speech

    E-print Network

    Barker, Jon

    The Relationship between Speech Perception and Auditory Organisation: Studies with Spectrally Reduced Speech Jon Barker Abstract Listeners are remarkably adept at recognising speech that has undergone extensive spectral reduction. Natural speech can be reproduced using as few as three time­varying sinusoids

  18. The Contribution of Sensitivity to Speech Rhythm and Non-Speech Rhythm to Early Reading Development

    ERIC Educational Resources Information Center

    Holliman, Andrew J.; Wood, Clare; Sheehy, Kieron

    2010-01-01

    Both sensitivity to speech rhythm and non-speech rhythm have been associated with successful phonological awareness and reading development in separate studies. However, the extent to which speech rhythm, non-speech rhythm and literacy skills are interrelated has not been examined. As a result, five- to seven-year-old English-speaking children…

  19. DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS

    E-print Network

    Cortes, Corinna

    DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS@nitech.ac.jp heigazen@google.com ABSTRACT This paper proposes a novel approach for directly-modeling speech speech synthesis frame- work with a specially designed output layer. As acoustic feature extraction

  20. Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

    ERIC Educational Resources Information Center

    Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

    2014-01-01

    Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

  1. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  2. What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W. Massaro

    E-print Network

    Massaro, Dominic

    What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W Abstract We consider the problem of speech recognition given visual and auditory information, and discuss, and third, the use of these production models to help guide automatic speech recognition. Finally, we

  3. A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition

    E-print Network

    Sollich, Peter

    1 A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition the individual front-ends across the full range of noise levels. Index Terms--Speech recognition, robustness, subbands, sup- port vector machines. I. INTRODUCTION AUTOMATIC speech recognition (ASR) systems suffer

  4. A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

    E-print Network

    Whelan, Paul F.

    A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Abstract. This paper presents the development of a novel visual speech recognition (VSR) system based on a new noting that they are problematic when applied to the continuous visual speech recognition. To circumvent

  5. Oldenburg Logatome Speech Corpus (OLLO) for Speech Recognition Experiments with Humans and Machines

    E-print Network

    Lübeck, Universität zu

    Oldenburg Logatome Speech Corpus (OLLO) for Speech Recognition Experiments with Humans and Machines corpora as it specifically targets (1) the fair comparison between human and machine speech recognition for automatic speech recognition (ASR) systems. To enable an unbiased human-machine comparison, OLLO is de

  6. THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION

    E-print Network

    Povinelli, Richard J.

    THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION Kevin M. Indrebo are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear. Introduction Spectral-based acoustic features have been the standard in speech recognition for many years, even

  7. An Effective Speech Understanding Method with a Multiple Speech Recognizer based on Output Selection using Edit Distance*

    E-print Network

    An Effective Speech Understanding Method with a Multiple Speech Recognizer based on Output for speech understanding. The method incorporates some speech recognizers. We use two recognizers, a large vocabulary continuous speech recognizer and a domain-specific speech recognizer. The integrated recognizer

  8. A causal test of the motor theory of speech perception: A case of impaired speech production and spared speech perception

    PubMed Central

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.

    2015-01-01

    In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749

  9. A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech Recognition and Machine Translation

    E-print Network

    and Wai Kit Lo ATR Spoken Language Translation Research Laboratories 2-2 Hikaridai, Seiika-cho, SorakuA Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech Recognition and Machine Translation Ruiqiang Zhang and Genichiro Kikui and Hirofumi Yamamoto Taro Watanabe and Frank Soong

  10. Determining the threshold for usable speech within co-channel speech with the SPHINX automated speech recognition system

    NASA Astrophysics Data System (ADS)

    Hicks, William T.; Yantorno, Robert E.

    2004-10-01

    Much research has been and is continuing to be done in the area of separating the original utterances of two speakers from co-channel speech. This is very important in the area of automated speech recognition (ASR), where the current state of technology is not nearly as accurate as human listeners when the speech is co-channel. It is desired to determine what types of speech (voiced, unvoiced, and silence) and at what target to interference ratio (TIR) two speakers can speak at the same time and not reduce speech intelligibility of the target speaker (referred to as usable speech). Knowing which segments of co-channel speech are usable in ASR can be used to improve the reconstruction of single speaker speech. Tests were performed using the SPHINX ASR software and the TIDIGITS database. It was found that interfering voiced speech with a TIR of 6 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech. It was further found that interfering unvoiced speech with a TIR of 18 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech.

  11. The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading

    ERIC Educational Resources Information Center

    Davidow, Jason H.; Ingham, Roger J.

    2013-01-01

    Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…

  12. Articulatory features for robust visual speech recognition

    E-print Network

    Saenko, Ekaterina, 1976-

    2004-01-01

    This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic ...

  13. A Welsh speech database: preliminary results. 

    E-print Network

    Williams, Briony

    1999-01-01

    A speech database for Welsh was recorded in a studio from read text by a few speakers. The purpose is to investigate the acoustic characteristics of Welsh speech sounds and prosody. It can also serve as a resource ...

  14. Speech and Language Problems in Children

    MedlinePLUS

    Children vary in their development of speech and language skills. Health professionals have milestones for what's normal. ... it may be due to a speech or language disorder. Language disorders can mean that the child ...

  15. Multimodal speech recognition with ultrasonic sensors

    E-print Network

    Zhu, Bo, M. Eng. Massachusetts Institute of Technology

    2008-01-01

    Ultrasonic sensing of articulator movement is an area of multimodal speech recognition that has not been researched extensively. The widely-researched audio-visual speech recognition (AVSR), which relies upon video data, ...

  16. Speech synthesis by phonological structure matching. 

    E-print Network

    Taylor, Paul; Black, Alan W

    1999-01-01

    This paper presents a new technique for speech synthesis by unit selection. The technique works by specifying the synthesis target and the speech database as phonological trees, and using a selection algorithm which ...

  17. Visual Prosody: Facial Movements Accompanying Speech 

    E-print Network

    Graf, Hans Peter; Cosatto, Eric; Strom, Volker; Huang, Fu Jie

    As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker's mood. We analyze ...

  18. Speech Sound Disorders: Articulation and Phonological Processes

    MedlinePLUS

    ... phonological disorders in children (article abstract) What do SLPs do when working with individuals with speech sound ... describe the typical clinical process followed by an SLP in these areas. Typical Speech and Language Development ...

  19. Speech rhythm guided syllable nuclei detection

    E-print Network

    Glass, James R.

    In this paper, we present a novel speech-rhythm-guided syllable-nuclei location detection algorithm. As a departure from conventional methods, we introduce an instantaneous speech rhythm estimator to predict possible regions ...

  20. Glottal Spectral Separation for Parametric Speech Synthesis 

    E-print Network

    Renals, Steve; Yamagishi, Junichi; Richmond, Korin; Cabral, Joao P

    2008-01-01

    This paper presents a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into a Hidden Markov Model (HMM)-based parametric speech synthesis system. In contrast to model adaptation...

  1. Structural Representation of Speech for Phonetic Classification 

    E-print Network

    Gutkin, Alexander; King, Simon

    This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition(ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive ...

  2. Parameter tuning for unit selection speech synthesis 

    E-print Network

    Keating, Joanna

    2005-01-01

    This project aims to contribute to current research on the quality of speech synthesis by conducting a perceptual experiment to discover a better set of target cost weights for the Festival speech synthesis system. From ...

  3. Speech Recognition by Machine, A Review

    E-print Network

    Anusuya, M A

    2010-01-01

    This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recog...

  4. President Kennedy's Speech at Rice University

    NASA Technical Reports Server (NTRS)

    1988-01-01

    This video tape presents unedited film footage of President John F. Kennedy's speech at Rice University, Houston, Texas, September 12, 1962. The speech expresses the commitment of the United States to landing an astronaut on the Moon.

  5. Speech Samples: A Clinical Check on Validity.

    ERIC Educational Resources Information Center

    Conant, Susan; Budoff, Milton

    1986-01-01

    Speech samples were elicited twice (four months apart) from a 4-year-old language delayed child. Clinical analysis involved examination of conversational turns and words, length of unit, speech act variables, amount of speech, and syntax. Although results clearly indicated an unmistakable surge in expressive language, analysis did not explain the…

  6. Speech Perception in Individuals with Auditory Neuropathy

    ERIC Educational Resources Information Center

    Zeng, Fan-Gang; Liu, Sheng

    2006-01-01

    Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…

  7. Free Speech in the College Community.

    ERIC Educational Resources Information Center

    O'Neil, Robert M.

    This book discusses freedom of speech issues affecting the college community, in light of "speech codes" imposed by some institutions, new electronic technology such as the Internet, and recent court decisions. Chapter 1 addresses campus speech codes, the advantages and disadvantages of such codes, and their conflict with the First Amendment of…

  8. Graduate Student SPEECH-LANGUAGE PATHOLOGY

    E-print Network

    Hung, I-Kuai

    Graduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY P to the program website for Speech-Language Pathology and Audiology at Stephen F Austin State University. The field of Speech-Language Pathology and Audiology is concerned with "normal" aspects and processes

  9. Speech-Language-Hearing Department of Communication

    E-print Network

    UNH Speech-Language-Hearing Center Department of Communication Sciences & Disorders Meet faculty at the UNH Speech- Language-Hearing Center. She considers herself a generalist, with a particular models. She is ASHA certified and holds a NH license to practice speech-language pathology. Professor

  10. Speech for People with Tracheostomies or Ventilators

    MedlinePLUS

    ... the Public / Speech, Language and Swallowing / Disorders and Diseases Speech for People With Tracheostomies or Ventilators [ en Español ] ... What impact does having a ventilator have on speech? For some people, a ... may need to be connected to a breathing machine (ventilator) that provides a ...

  11. DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

    ERIC Educational Resources Information Center

    KARLIN, ISAAC W.; AND OTHERS

    THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…

  12. Speech and Hearing Science, Anatomy and Physiology.

    ERIC Educational Resources Information Center

    Zemlin, Willard R.

    Written for those interested in speech pathology and audiology, the text presents the anatomical, physiological, and neurological bases for speech and hearing. Anatomical nomenclature used in the speech and hearing sciences is introduced and the breathing mechanism is defined and discussed in terms of the respiratory passage, the framework and…

  13. Campus Speech Codes Said to Violate Rights

    ERIC Educational Resources Information Center

    Lipka, Sara

    2007-01-01

    Most college and university speech codes would not survive a legal challenge, according to a report released in December by the Foundation for Individual Rights in Education, a watchdog group for free speech on campuses. The report labeled many speech codes as overly broad or vague, and cited examples such as Furman University's prohibition of…

  14. The Dynamic Nature of Speech Perception

    ERIC Educational Resources Information Center

    McQueen, James M.; Norris, Dennis; Cutler, Anne

    2006-01-01

    The speech perception system must be flexible in responding to the variability in speech sounds caused by differences among speakers and by language change over the lifespan of the listener. Indeed, listeners use lexical knowledge to retune perception of novel speech (Norris, McQueen, & Cutler, 2003). In that study, Dutch listeners made lexical…

  15. Robustness of HMM-based Speech Synthesis 

    E-print Network

    Yamagishi, Junichi; Ling, Zhenhua; King, Simon

    2008-01-01

    of several speech synthesis methods under such conditions. This is, as far as we know, a new research topic: ``Robust speech synthesis.'' As a consequence of our investigations, we propose a new robust training method for the HMM-based speech synthesis...

  16. Speech Errors (Review) Issues in Lexicalization

    E-print Network

    Coulson, Seana

    Plan · Speech Errors (Review) · Issues in Lexicalization · LRP in Language Production (Review) · N a leading list · Exchange Errors ­ fill the pool fool the pill · Phonological, lexical, syntactic · Speech · Morphological rules of word formation engaged during speech production #12;Stranding Errors · Nouns & Verbs

  17. Cognitive Functions in Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Nijland, Lian; Terband, Hayo; Maassen, Ben

    2015-01-01

    Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…

  18. Speech & Hearing Clinic College of Science

    E-print Network

    Hickman, Mark

    Speech & Hearing Clinic College of Science Department of Communication Disorders How to contact us: Please contact the Speech and Hearing Clinic during business hours to make an appointment. Clinical will tell you what we have learned about your child's speech and language and phonological awareness skills

  19. Syllable Structure in Dysfunctional Portuguese Children's Speech

    ERIC Educational Resources Information Center

    Candeias, Sara; Perdigao, Fernando

    2010-01-01

    The goal of this work is to investigate whether children with speech dysfunctions (SD) show a deficit in planning some Portuguese syllable structures (PSS) in continuous speech production. Knowledge of which aspects of speech production are affected by SD is necessary for efficient improvement in the therapy techniques. The case-study is focused…

  20. Listener Effort for Highly Intelligible Tracheoesophageal Speech

    ERIC Educational Resources Information Center

    Nagle, Kathy F.; Eadie, Tanya L.

    2012-01-01

    The purpose of this study was to determine whether: (a) inexperienced listeners can reliably judge listener effort and (b) whether listener effort provides unique information beyond speech intelligibility or acceptability in tracheoesophageal speech. Twenty inexperienced listeners made judgments of speech acceptability and amount of effort…

  1. Audiovisual Asynchrony Detection in Human Speech

    ERIC Educational Resources Information Center

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  2. Interventions for Speech Sound Disorders in Children

    ERIC Educational Resources Information Center

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  3. Acoustics of Clear Speech: Effect of Instruction

    ERIC Educational Resources Information Center

    Lam, Jennifer; Tjaden, Kris; Wilding, Greg

    2012-01-01

    Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

  4. GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS

    E-print Network

    Allauzen, Cyril

    GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large- vocabulary speech recognition

  5. HIDDENARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION

    E-print Network

    Noble, William Stafford

    HIDDEN­ARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION Matt Richardson, Jeff Bilmes and Chris speech recognition using Hidden Markov Models (HMMs), each state represents an acoustic portion assist speech recognition. We demonstrate this by showing that our mapping of articulatory configurations

  6. GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION

    E-print Network

    Noble, William Stafford

    GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION JEFFREY A. BILMES Abstract. Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition as part of a speech recognition system can be described by a graph ­ this includes Gaussian dis

  7. Information Bottleneck and HMM for Speech Recognition

    E-print Network

    Linial, Nathan "Nati"

    Information Bottleneck and HMM for Speech Recognition Thesis submitted for the degree of "Master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 #12;CONTENTS 3 5.1.1 Standard speech recognition using HMM . . . . . . . . . . . . . . 35 5;Chapter 1 Introduction The task of speech recognition is difficult and complex for many reasons

  8. GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS

    E-print Network

    Mohri, Mehryar

    GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large­ vocabulary speech recognition

  9. Regularizing Linear Discriminant Analysis for Speech Recognition

    E-print Network

    Erdogan, Hakan

    Regularizing Linear Discriminant Analysis for Speech Recognition Hakan Erdogan Faculty in a pattern recognition system is the feature extractor. Feature extraction is an important step for speech recognition since the time-domain speech signal is highly variable, thus complex linear and nonlinear

  10. Speech recognition with amplitude and frequency modulations

    E-print Network

    Chen, Zhongping

    Speech recognition with amplitude and frequency modulations Fan-Gang Zeng* , Kaibao Nie*, Ginger S, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived number of spectral bands may be sufficient for speech recognition in quiet, FM significantly en- hances

  11. Robust Speech Recognition Using Articulatory Information

    E-print Network

    Kirchhoff, Katrin

    Robust Speech Recognition Using Articulatory Information Der Technischen FakultË? at der Universit­up acoustic modeling component in a speech recognition system. The second focus point of this thesis different speech recognition tasks. The first of these is an American English corpus of telephone

  12. Speech-Song Interface of Chinese Speakers

    ERIC Educational Resources Information Center

    Mang, Esther

    2007-01-01

    Pitch is a psychoacoustic construct crucial in the production and perception of speech and songs. This article is an exploration of the interface of speech and song performance of Chinese speakers. Although parallels might be drawn from the prosodic and sound structures of the linguistic and musical systems, perceiving and producing speech and…

  13. Nonlinear Statistical Modeling of Speech

    NASA Astrophysics Data System (ADS)

    Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

    2009-12-01

    Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and discriminative training algorithms for these new models to improve noise robustness.

  14. Auditory models for speech analysis

    NASA Astrophysics Data System (ADS)

    Maybury, Mark T.

    This paper reviews the psychophysical basis for auditory models and discusses their application to automatic speech recognition. First an overview of the human auditory system is presented, followed by a review of current knowledge gleaned from neurological and psychoacoustic experimentation. Next, a general framework describes established peripheral auditory models which are based on well-understood properties of the peripheral auditory system. This is followed by a discussion of current enhancements to that models to include nonlinearities and synchrony information as well as other higher auditory functions. Finally, the initial performance of auditory models in the task of speech recognition is examined and additional applications are mentioned.

  15. Speech Communication and Telephone Networks

    NASA Astrophysics Data System (ADS)

    Gierlich, H. W.

    Speech communication over telephone networks has one major constraint: The communication has to be “real time”. The basic principle since the beginning of all telephone networks has been to provide a communication system capable of substituting the air path between two persons having a conversation at 1-m distance. This is the so-called orthotelephonic reference position [7]. Although many technical compromises must be made to enable worldwide communication over telephone networks, it is still the goal to achieve speech quality performance which is close to this reference.

  16. Perception of Speech Reflects Optimal Use of Probabilistic Speech Cues

    ERIC Educational Resources Information Center

    Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.

    2008-01-01

    Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…

  17. [Nature of speech disorders in Parkinson disease].

    PubMed

    Pawlukowska, W; Honczarenko, K; Go??b-Janowska, M

    2013-01-01

    The aim of the study was to discuss physiology and pathology of speech and review of the literature on speech disorders in Parkinson disease. Additionally, the most effective methods to diagnose the speech disorders in Parkinson disease were also stressed. Afterward, articulatory, respiratory, acoustic and pragmatic factors contributing to the exacerbation of the speech disorders were discussed. Furthermore, the study dealt with the most important types of speech treatment techniques available (pharmacological and behavioral) and a significance of Lee Silverman Voice Treatment was highlighted. PMID:23821424

  18. Pulse Vector-Excitation Speech Encoder

    NASA Technical Reports Server (NTRS)

    Davidson, Grant; Gersho, Allen

    1989-01-01

    Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.

  19. Phrase-programmable digital speech system

    SciTech Connect

    Raymond, W.J.; Morgan, R.L.; Miller, R.L.

    1987-01-27

    This patent describes a phrase speaking computer system having a programmable digital computer and a speech processor, the speech processor comprising: a voice synthesizer; a read/write speech data segment memory; a read/write command memory; control processor means including processor control programs and logic connecting to the memories and to the voice synthesizer. It is arranged to scan the command memory and to respond to command data entries stored therein by transferring corresponding speech data segments from the speech data segment memory to the voice synthesizer; data conveyance means, connecting the computer to the command memory and the speech data segment memory, for transferring the command data entries supplied by the computer into the command memory and for transferring the speech data segments supplied by the computer into the speech data segment memory; and an enable signal line connecting the computer to the speech processor and arranged to initiate the operation of the processor control programs and logic when the enable signal line is enabled by the computer; the programmable computer including speech control programs controlling the operation of the computer including data conveyance command sequences that cause the computer to supply command data entries to the data conveyance means and speech processor enabling command sequences that cause computer to energize the enable signal line.

  20. Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders.

    PubMed

    Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline

    2015-11-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided. PMID:26458198

  1. Speech and Language Developmental Milestones

    MedlinePLUS

    ... What are the milestones for speech and language development? The first signs of communication occur when an infant learns that a cry will bring food, comfort, and companionship. Newborns also begin to recognize important sounds in their environment, such as the voice of their mother or ...

  2. Speech Research. Interim Scientific Report.

    ERIC Educational Resources Information Center

    Cooper, Franklin S.

    The status and progress of several studies dealing with the nature of speech, instrumentation for its investigation, and instrumentation for practical applications is reported on. The period of January 1 through June 30, 1969 is covered. Extended reports and manuscripts cover the following topics: programing for the Glace-Holmes synthesizer,…

  3. Concept to Speech Generation Systems

    E-print Network

    Meeting of the Association for Computational Linguistics Edited by Kai Alter, Hannes Pirker, and Wolfgang in a Speech Dialogue System Peter Poller and Paul Heisterkamp ........................... Integrating Language Institute for AI (OFAI) & Max-Planck-Institute of Cognitive Neuroscience alterQcns.mpg.de Wolfgang Finkler

  4. Linguistic aspects of speech synthesis.

    PubMed Central

    Allen, J

    1995-01-01

    The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

  5. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-15

    ...relay service (VRS), Internet Protocol Relay (IP Relay...speech disabilities can access the telephone system...of STS that utilizes Internet- based transmissions...calls on a mobile or Internet-enabled device, by...interconnected VoIP service to access a state STS relay...

  6. Speech of Mentally Disabled Children.

    ERIC Educational Resources Information Center

    Willis, Bruce

    The study summarized in this paper deals with the grammatical analysis of the spontaneous speech of approximately 150 children who are classified as mentally disabled; educable (I.Q. range 50-80). The performance of these mentally disadvantaged children is compared with the performance of 200 normally developing children by using a clinical…

  7. Embedding speech into virtual realities

    NASA Technical Reports Server (NTRS)

    Bohn, Christian-Arved; Krueger, Wolfgang

    1993-01-01

    In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

  8. Extensions to the Speech Disorders Classification System (SDCS)

    ERIC Educational Resources Information Center

    Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

    2010-01-01

    This report describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three sub-types of motor speech disorders.…

  9. Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues

    ERIC Educational Resources Information Center

    Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

    2010-01-01

    Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

  10. THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.

    ERIC Educational Resources Information Center

    FOULKE, EMERSON

    A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…

  11. Rethinking Speech Recognition on Mobile Devices Anuj Kumar1

    E-print Network

    Kam, Matthew

    1 Rethinking Speech Recognition on Mobile Devices Anuj Kumar1 , Anuj Tewari2 , Seth Horrigan2 for automatic speech recognition (ASR) systems on mobile devices that are currently used ­ embedded speech recognition, speech recognition in the cloud, and distributed speech recognition; evaluate their advantages

  12. Noise adaptive speech recognition based on sequential noise parameter estimation

    E-print Network

    Noise adaptive speech recognition based on sequential noise parameter estimation Kaisheng Yao a In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition

  13. Loss tolerant speech decoder for telecommunications

    NASA Technical Reports Server (NTRS)

    Prieto, Jr., Jaime L. (Inventor)

    1999-01-01

    A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.

  14. Speech entrainment compensates for Broca's area damage.

    PubMed

    Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

    2015-08-01

    Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to SE. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during SE versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of SE to improve speech production and may help select patients for SE treatment. PMID:25989443

  15. Some articulatory details of emotional speech

    NASA Astrophysics Data System (ADS)

    Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

    2005-09-01

    Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

  16. Sensorimotor influences on speech perception in infancy.

    PubMed

    Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

    2015-11-01

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development. PMID:26460030

  17. Sensorimotor influences on speech perception in infancy

    PubMed Central

    Bruderer, Alison G.; Danielson, D. Kyle; Kandhadai, Padmapriya; Werker, Janet F.

    2015-01-01

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception–production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants’ speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants’ tongues. With a looking-time procedure, we found that temporarily restraining infants’ articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral–motor movements influence speech sound discrimination. Moreover, an experimentally induced “impairment” in articulator movement can compromise speech perception performance, raising the question of whether long-term oral–motor impairments may impact perceptual development. PMID:26460030

  18. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    PubMed

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available. PMID:25951749

  19. The Neural Bases of Difficult Speech Comprehension and Speech Production: Two Activation Likelihood Estimation (ALE) Meta-Analyses

    ERIC Educational Resources Information Center

    Adank, Patti

    2012-01-01

    The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…

  20. MSc Speech Lab 01 Speech Analysis mwmak:doc/notes/sp/beng/beng_sp_lab01.doc 6 March, 20031

    E-print Network

    Mak, Man-Wai

    MSc Speech Lab 01 Speech Analysis mwmak:doc/notes/sp/beng/beng_sp_lab01.doc 6 March, 20031 THE HONG Speech Lab 01 Speech Analysis mwmak:doc/notes/sp/beng/beng_sp_lab01.doc 6 March, 20032 10) Click Plot Lab 01 Speech Analysis mwmak:doc/notes/sp/beng/beng_sp_lab01.doc 6 March, 20033 Xh = abs

  1. The Levels of Speech Usage Rating Scale: Comparison of Client Self-Ratings with Speech Pathologist Ratings

    ERIC Educational Resources Information Center

    Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

    2012-01-01

    Background: The term "speech usage" refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a categorical…

  2. Speech Enhancement for Android (SEA): A Speech Processing Demonstration Tool for Android Based Smart Phones and Tablets

    E-print Network

    Speech Enhancement for Android (SEA): A Speech Processing Demonstration Tool for Android Based presents a speech processing platform which can be used to demonstrate and investigate speech enhancement methods. This platform is called Speech Enhancement for Android (SEA), and has been developed

  3. SYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Bjrn Granstrm, Magnus Lundeberg, Giampiero Salvi,

    E-print Network

    Beskow, Jonas

    SYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Björn Granström of Speech, Music and Hearing, KTH, Sweden {eva, beskow, bjorn, magnusl, giampi, kalle, tobias}@speech.kth.se www.speech.kth.se/teleface/ ABSTRACT We have developed two different methods for using auditory

  4. Unit selection in a concatenative speech synthesis system using a large speech database 

    E-print Network

    Hunt, Andrew; Black, Alan W

    One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural ...

  5. On the use of prosody in a speech-to-speech translator. 

    E-print Network

    Strom, Volker; Elsner, Anja; Hess, Wolfgang; Kasper, Walter; Klein, Alexandra; Kreiger, Hans Ulrich; Spilker, Jorg; Weber, Hans; Görz, Gunther

    1997-01-01

    In this paper a speech-to-speech translator from German to English is presented. Beside the traditional processing steps it takes advantage of acoustically detected prosodic phrase boundaries and focus. The prosodic phrase ...

  6. Primary Progressive Aphasia and Apraxia of Speech

    PubMed Central

    Jung, Youngsin; Duffy, Joseph R.; Josephs, Keith A.

    2014-01-01

    Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: non-fluent/agrammatic, semantic, and logopenic variants of primary progressive aphasia. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. Here, we review clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech. The distinctions among these disorders will be crucial since accurate diagnosis will be important from a prognostic and therapeutic standpoint. PMID:24234355

  7. Manual transcription of conversational speech at the articulatory feature level 

    E-print Network

    Livescu, Karen; Bezman, Ari; Borges, Nash; Yung, Lisa; Çetin, Ozgur; Frankel, Joe; King, Simon; Magimai-Doss, Mathew; Chi, Xuemin; Lavoie, Lisa

    2007-01-01

    in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone...

  8. Speech and Language Disorders in the School Setting

    MedlinePLUS

    ... and Swallowing / Development Frequently Asked Questions: Speech and Language Disorders in the School Setting What types of speech and language disorders affect school-age children ? Do speech-language ...

  9. CONTINUOUS TRACHEOESOPHAGEAL SPEECH REPAIR Arantza del Pozo and Steve Young

    E-print Network

    Young, Steve

    as the alaryngeal speech alternative most comparable to nor- mal laryngeal speech in quality, fluency and ease of normal laryngeal speech, being perceptually described as more breathy, rough, low, deep, unsteady

  10. Speech & Language Therapy for Children and Adolescents with Down Syndrome

    MedlinePLUS

    ... Can I Find a Qualified Speech-Language Pathologist (SLP)? Qualified SLPs are certified by the American Speech-Language-Hearing ... professionals have been certified, they can use CCC-SLP (Certificate of Clinical Competence in Speech-Language Pathology) ...

  11. A Resource Manual for Speech and Hearing Programs in Oklahoma.

    ERIC Educational Resources Information Center

    Oklahoma State Dept. of Education, Oklahoma City.

    Administrative aspects of the Oklahoma speech and hearing program are described, including state requirements, school administrator role, and organizational and operational procedures. Information on speech and language development and remediation covers language, articulation, stuttering, voice disorders, cleft palate, speech improvement,…

  12. Cross-lingual automatic speech recognition using tandem features 

    E-print Network

    Lal, Partha

    2011-11-24

    Automatic speech recognition requires many hours of transcribed speech recordings in order for an acoustic model to be effectively trained. However, recording speech corpora is time-consuming and expensive, so such ...

  13. HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION

    E-print Network

    Sollich, Peter

    HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION Matthew Ager , Zoran Cvetkovi-- acoustic waveforms, phoneme, classification, robust, speech recognition 1. INTRODUCTION Many studies have shown that automatic speech recognition (ASR) systems still lack performance when compared to human

  14. SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan

    E-print Network

    Erdogan, Hakan

    SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan Faculty of Engineering vectors. For speech recognition, N is usually prohibitively high increasing com- putational requirements version of KDA that enables its application to speech recognition, thus conveniently enabling nonlinear

  15. Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies

    E-print Network

    Johnson, Michael T.

    Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies Marek B, or squared distance, before passing the enhanced single-channel signal into the speech recognition system contained in the signals, speech recognition systems can achieve higher recognition accuracies. 1

  16. COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE

    E-print Network

    Ellis, Dan

    COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE Dan Ellis of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while approaches will require more radical adaptation of current speech recognition approaches. 1. INTRODUCTION

  17. Applications of broad class knowledge for noise robust speech recognition

    E-print Network

    Sainath, Tara N

    2009-01-01

    This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. ...

  18. Children's perception of their modified speech preliminary findings

    E-print Network

    Boye, Johan

    Children's perception of their modified speech ­ preliminary findings Sofia Strömbergsson-6 year-old children's perception of synthetically modified versions of their own recorded speech. Recordings of the children's speech production are automatically modified so that the initial consonant

  19. Headphone localization of speech stimuli

    NASA Technical Reports Server (NTRS)

    Begault, Durand R.; Wenzel, Elizabeth M.

    1991-01-01

    Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

  20. NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION

    E-print Network

    Virtanen, Tuomas

    NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech robustness, automatic speech recognition, non-negative matrix factorization, speech enhancement 1

  1. Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition

    E-print Network

    Mak, Brian Kan-Wing

    Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak method to explicitly model the phone deletion phenomenon in speech, and introduced the context-FWM could reduce word error rate (WER) by a relative 10.3%. Since it is generally expected that the phone

  2. Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain1

    E-print Network

    Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech an approach of using lexical stress mod- els to improve the speech recognition performance on sponta- neous with lexical stress on a large corpus of spontaneous utterances, and identified the most informative features

  3. Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

    ERIC Educational Resources Information Center

    Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

    2009-01-01

    Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…

  4. Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

    E-print Network

    Duh, Kevin

    Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis@is.naist.jp, tokuda@ics.nitech.ac.jp Abstract This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes

  5. Spotlight on Speech Codes 2007: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2007

    2007-01-01

    Last year, the Foundation for Individual Rights in Education (FIRE) conducted its first-ever comprehensive study of restrictions on speech at America's colleges and universities, "Spotlight on Speech Codes 2006: The State of Free Speech on our Nation's Campuses." In light of the essentiality of free expression to a truly liberal education, its…

  6. Spotlight on Speech Codes 2012: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2012

    2012-01-01

    The U.S. Supreme Court has called America's colleges and universities "vital centers for the Nation's intellectual life," but the reality today is that many of these institutions severely restrict free speech and open debate. Speech codes--policies prohibiting student and faculty speech that would, outside the bounds of campus, be protected by the…

  7. Stability and Composition of Functional Synergies for Speech Movements in Children with Developmental Speech Disorders

    ERIC Educational Resources Information Center

    Terband, H.; Maassen, B.; van Lieshout, P.; Nijland, L.

    2011-01-01

    The aim of this study was to investigate the consistency and composition of functional synergies for speech movements in children with developmental speech disorders. Kinematic data were collected on the reiterated productions of syllables spa(/spa[image omitted]/) and paas(/pa[image omitted]s/) by 10 6- to 9-year-olds with developmental speech

  8. Speech and Language Skills of Parents of Children with Speech Sound Disorders

    ERIC Educational Resources Information Center

    Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry

    2007-01-01

    Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…

  9. Open Domain Speech Translation: From Seminars and Speeches to Lectures Christian Fgen

    E-print Network

    Schultz, Tanja

    and Corpora for Speech to Speech Translation) and CHIL (Computers in the Human Interaction Loop). The paper do- main speech translation of lectures starting from systems built within the framework of CHIL and TC-STAR. CHIL [25], Computers in the Human Interaction Loop, aims at making significant advances

  10. Private and Inner Speech and the Regulation of Social Speech Communication

    ERIC Educational Resources Information Center

    San Martin Martinez, Conchi; Boada i Calbet, Humbert; Feigenbaum, Peter

    2011-01-01

    To further investigate the possible regulatory role of private and inner speech in the context of referential social speech communications, a set of clear and systematically applied measures is needed. This study addresses this need by introducing a rigorous method for identifying private speech and certain sharply defined instances of inaudible…

  11. TOWARDS USING PROSODY IN SPEECH RECOGNITION/UNDERSTANDING SYSTEMS: DIFFERENCES BETWEEN READ AND SPONTANEOUS SPEECH

    E-print Network

    TOWARDS USING PROSODY IN SPEECH RECOGNITION/UNDERSTANDING SYSTEMS: DIFFERENCES BETWEEN READ for keyword-driven speech recognition sys- tems is that users often embed the to-be-recognized words because of their rele- vance to speech recognition, and for increasing the natural- ness of synthetic

  12. Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition

    E-print Network

    Virtanen, Tuomas

    as by using automatic speech recognition. Experiments on the PASCAL CHiME challenge corpus, which contains53 Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition Jort F. Gemmeke1 , Tuomas Virtanen2 , Antti Hurmalainen2 1 Department ESAT, Katholieke

  13. Acquisition of speech rhythm in first language.

    PubMed

    Polyanskaya, Leona; Ordin, Mikhail

    2015-09-01

    Analysis of English rhythm in speech produced by children and adults revealed that speech rhythm becomes increasingly more stress-timed as language acquisition progresses. Children reach the adult-like target by 11 to 12 years. The employed speech elicitation paradigm ensured that the sentences produced by adults and children at different ages were comparable in terms of lexical content, segmental composition, and phonotactic complexity. Detected differences between child and adult rhythm and between rhythm in child speech at various ages cannot be attributed to acquisition of phonotactic language features or vocabulary, and indicate the development of language-specific phonetic timing in the course of acquisition. PMID:26428813

  14. American Speech-Language-Hearing Association

    MedlinePLUS

    ... My Account Login Toggle navigation Careers Certification Publications Events Advocacy Continuing Education Practice Management Research American Speech-Language-Hearing Association (ASHA) Making ...

  15. Speech disorders of Parkinsonism: a review.

    PubMed Central

    Critchley, E M

    1981-01-01

    Study of the speech disorders of Parkinsonism provides a paradigm of the integration of phonation, articulation and language in the production of speech. The initial defect in the untreated patient is a failure to control respiration for the purpose of speech and there follows a forward progression of articulatory symptoms involving larynx, pharynx, tongue and finally lips. There is evidence that the integration of speech production is organised asymmetrically at thalamic level. Experimental or therapeutic lesions in the region of the inferior medial portion of ventro-lateral thalamus may influence the initiation, respiratory control, rate and prosody of speech. Higher language functions may also be involved in thalamic integration: different forms of anomia are reported with pulvinar and ventrolateral thalamic lesions and transient aphasia may follow stereotaxis. The results of treatment with levodopa indicates that neurotransmitter substances enhance the clarity, volume and persistence of phonation and the latency and smoothness of articulation. The improvement of speech performance is not necessarily in phase with locomotor changes. The dose-related dyskinetic effects of levodopa, which appear to have a physiological basis in observations previously made in post-encephalitic Parkinsonism, not only influence the prosody of speech with near-mutism, hesitancy and dysfluency but may affect work-finding ability and in instances of excitement (erethism) even involve the association of long-term memory with speech. In future, neurologists will need to examine more closely the role of neurotransmitters in speech production and formulation. PMID:7031185

  16. Articulatory Features for Robust Visual Speech Recognition

    E-print Network

    sources of linguistic information, including nonacoustic sensors [24], to provide greater redundancy complementary linguistic information. Using the images of the speaker's mouth to recognize speech is commonly

  17. Estimation of Speech Intelligibility and Quality

    NASA Astrophysics Data System (ADS)

    Voran, Stephen

    Speech communication requires a talker and a listener. Acoustical and in some cases electrical representations of the speech are carried from the talker to the listener by some system. This system might consist of the air in a room, or it might involve electro-acoustic transducers and sound reinforcement or telecommunications equipment. Interfering noises (including reverberation of speech) may be present and these may impinge upon and affect the talker, the system, and the listener. A schematic representation of this basic unidirectional speech communication scenario is given in Figure 1.

  18. Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech

    ERIC Educational Resources Information Center

    Maas, Edwin; Mailend, Marja-Liisa

    2012-01-01

    Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…

  19. Speech perception as an active cognitive process

    PubMed Central

    Heald, Shannon L. M.; Nusbaum, Howard C.

    2014-01-01

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438

  20. Children with a cochlear implant: characteristics and determinants of speech recognition, speech-recognition growth rate, and speech production.

    PubMed

    Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce

    2007-05-01

    The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language. PMID:17487671

  1. Speech Perception and Working Memory in Children with Residual Speech Errors: A Case Study Analysis.

    PubMed

    Cabbage, Kathryn L; Farquharson, Kelly; Hogan, Tiffany P

    2015-11-01

    Some children with residual deficits in speech production also display characteristics of dyslexia; however, the causes of these disorders--in isolation or comorbidly--remain unknown. Presently, the role of phonological representations is an important construct for considering how the underlying system of phonology functions. In particular, two related skills--speech perception and phonological working memory--may provide insight into the nature of phonological representations. This study provides an exploratory investigation into the profiles of three 9-year-old children: one with residual speech errors, one with residual speech errors and dyslexia, and one who demonstrated typical, age-appropriate speech sound production and reading skills. We provide an in-depth examination of their relative abilities in the areas of speech perception, phonological working memory, vocabulary, and word reading. Based on these preliminary explorations, we suggest implications for the assessment and treatment of children with residual speech errors and/or dyslexia. PMID:26458199

  2. Modulation of Frontal Lobe Speech Areas Associated with the Production and Perception of Speech Movements

    PubMed Central

    Fridriksson, Julius; Moser, Dana; Ryalls, Jack; Bonilha, Leonardo; Rorden, Chris; Baylis, Gordon

    2008-01-01

    Purpose It is unclear if the production and perception of speech movements are sub served by the same brain networks. The purpose of this study was to investigate neural recruitment in cortical areas commonly associated with speech production during the production and visual perception of speech. Method This study utilized functional magnetic resonance imaging (fMRI) to assess brain function while participants either imitated or observed speech movements. Results A common neural network was recruited by both tasks: greatest frontal lobe activity in Broca’s area was triggered not only when producing speech but also when watching speech movements. Relatively less activity was observed in the left anterior insula during both tasks. Conclusions These results support the emerging view that cortical areas involved in the execution of speech movements are also recruited in the perception of the same movements in other speakers. PMID:18978212

  3. Method and apparatus for obtaining complete speech signals for speech recognition applications

    NASA Technical Reports Server (NTRS)

    Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

    2009-01-01

    The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

  4. Localization of Sublexical Speech Perception Components

    ERIC Educational Resources Information Center

    Turkeltaub, Peter E.; Coslett, H. Branch

    2010-01-01

    Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception.…

  5. Speech after Mao: Literature and Belonging

    ERIC Educational Resources Information Center

    Hsieh, Victoria Linda

    2012-01-01

    This dissertation aims to understand the apparent failure of speech in post-Mao literature to fulfill its conventional functions of representation and communication. In order to understand this pattern, I begin by looking back on the utility of speech for nation-building in modern China. In addition to literary analysis of key authors and works,…

  6. School Principal Speech about Fiscal Mismanagement

    ERIC Educational Resources Information Center

    Hassenpflug, Ann

    2015-01-01

    A review of two recent federal court cases concerning school principals who experienced adverse job actions after they engaged in speech about fiscal misconduct by other employees indicates that the courts found that the principal's speech was made as part of his or her job duties and was not protected by the First Amendment.

  7. Building Searchable Collections of Enterprise Speech Data.

    ERIC Educational Resources Information Center

    Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

    The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…

  8. MULTILINGUAL PHONE RECOGNITION OF SPONTANEOUS TELEPHONE SPEECH

    E-print Network

    MULTILINGUAL PHONE RECOGNITION OF SPONTANEOUS TELEPHONE SPEECH C. Corredor-Ardoy, L. Lamel, M. Adda,lamel,madda,gauvaing@limsi.fr http://www.limsi.fr/TLP ABSTRACT In this paper we report on experiments with phone recognition of spontaneoustelephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing

  9. Toddlers' recognition of noise-vocoded speech

    PubMed Central

    Newman, Rochelle; Chatterjee, Monita

    2013-01-01

    Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item (“Find the cat!”). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation. PMID:23297920

  10. Milton's "Areopagitica" Freedom of Speech on Campus

    ERIC Educational Resources Information Center

    Sullivan, Daniel F.

    2006-01-01

    The author discusses the content in John Milton's "Areopagitica: A Speech for the Liberty of Unlicensed Printing to the Parliament of England" (1985) and provides parallelism to censorship practiced in higher education. Originally published in 1644, "Areopagitica" makes a powerful--and precocious--argument for freedom of speech and against…

  11. Reliability of Speech Diadochokinetic Test Measurement

    ERIC Educational Resources Information Center

    Gadesmann, Miriam; Miller, Nick

    2008-01-01

    Background: Measures of articulatory diadochokinesis (DDK) are widely used in the assessment of motor speech disorders and they play a role in detecting abnormality, monitoring speech performance changes and classifying syndromes. Although in clinical practice DDK is generally measured perceptually, without support from instrumental methods that…

  12. Speech-Language Pathology: Preparing Early Interventionists

    ERIC Educational Resources Information Center

    Prelock, Patricia A.; Deppe, Janet

    2015-01-01

    The purpose of this article is to explain the role of speech-language pathology in early intervention. The expected credentials of professionals in the field are described, and the current numbers of practitioners serving young children are identified. Several resource documents available from the American Speech-­Language Hearing Association are…

  13. Speech masking and cancelling and voice obscuration

    DOEpatents

    Holzrichter, John F.

    2013-09-10

    A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

  14. Repeated Speech Errors: Evidence for Learning

    ERIC Educational Resources Information Center

    Humphreys, Karin R.; Menzies, Heather; Lake, Johanna K.

    2010-01-01

    Three experiments elicited phonological speech errors using the SLIP procedure to investigate whether there is a tendency for speech errors on specific words to reoccur, and whether this effect can be attributed to implicit learning of an incorrect mapping from lemma to phonology for that word. In Experiment 1, when speakers made a phonological…

  15. A National Survey of Cued Speech Programs.

    ERIC Educational Resources Information Center

    Quenin, Catherine Sheridan; Blood, Ingrid

    1989-01-01

    A survey of 60 United States schools and programs currently using Cued Speech with hearing-impaired individuals found that the tool is used in both oral and total communication environments. The survey collected data on demographics, types of programs, number of students using Cued Speech, methodologies employed, and support services offered.…

  16. The Oral Speech Mechanism Screening Examination (OSMSE).

    ERIC Educational Resources Information Center

    St. Louis, Kenneth O.; Ruscello, Dennis M.

    Although speech-language pathologists are expected to be able to administer and interpret oral examinations, there are currently no screening tests available that provide careful administration instructions and data for intra-examiner and inter-examiner reliability. The Oral Speech Mechanism Screening Examination (OSMSE) is designed primarily for…

  17. SUBSPACE METHODS FOR MULTIMICROPHONE SPEECH DEREVERBERATION

    E-print Network

    , such as speech coding or automatic speech recognition might be rendered useless in the presence of reverberated transfer function(ATF) estima­ tion as well as some suboptimal procedures. The derivation of the algorithm and the IT­project Multi­microphone Signal Enhancement Techniques for handsfree telephony and voice

  18. Speech. Language Arts Mini-Course.

    ERIC Educational Resources Information Center

    Lampeter-Strasburg School District, PA.

    This language arts minicourse guide for Lampeter-Strasburg (Pennsylvania) High School contains a topical outline for a speech course. The guide includes a list of sixteen course objectives; an outline of the elements of speech communication to be covered by the course; a description of the content and concepts to be studied in interpersonal and…

  19. Speech & Hearing Clinic College of Science

    E-print Network

    Hickman, Mark

    Speech & Hearing Clinic College of Science Department of Communication Disorders How to contact us: If you feel that you or your child has hearing or listening difficulties and would like a formal assessment, please contact the Speech and Hearing Clinic during business hours to make an appointment

  20. Pulmonic Ingressive Speech in Shetland English

    ERIC Educational Resources Information Center

    Sundkvist, Peter

    2012-01-01

    This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…

  1. Anatomy and Physiology of the Speech Mechanism.

    ERIC Educational Resources Information Center

    Sheets, Boyd V.

    This monograph on the anatomical and physiological aspects of the speech mechanism stresses the importance of a general understanding of the process of verbal communication. Contents include "Positions of the Body,""Basic Concepts Linked with the Speech Mechanism,""The Nervous System,""The Respiratory System--Sound-Power Source,""The…

  2. Speech and Language Delays in Identical Twins.

    ERIC Educational Resources Information Center

    Bentley, Pat

    Following a literature review on speech and language development of twins, case studies are presented of six sets of identical twins screened for entrance into kindergarten. Five sets of the twins and one boy from the sixth set failed to pass the screening test, particularly the speech and language section, and were referred for therapy to correct…

  3. The Lombard Effect on Alaryngeal Speech.

    ERIC Educational Resources Information Center

    Zeine, Lina; Brandt, John F.

    1988-01-01

    The study investigated the Lombard effect (evoking increased speech intensity by applying masking noise to ears of talker) on the speech of esophageal talkers, artificial larynx users, and normal speakers. The noise condition produced the highest intensity increase in the esophageal speakers. (Author/DB)

  4. Speech Intelligibility in Severe Adductor Spasmodic Dysphonia

    ERIC Educational Resources Information Center

    Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.

    2004-01-01

    This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…

  5. SPEECH PERCEPTION IN VIRTUAL ENVIRONMENTS A DISSERTATION

    E-print Network

    Blake, Edwin

    perception. Spatial unmasking refers to the hearing benefit achieved when the target sound and masking soundSPEECH PERCEPTION IN VIRTUAL ENVIRONMENTS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER that influence the perception of speech in virtual environments under adverse listening conditions. A virtual

  6. Philosophy of Research in Motor Speech Disorders

    ERIC Educational Resources Information Center

    Weismer, Gary

    2006-01-01

    The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…

  7. Treatment Intensity and Childhood Apraxia of Speech

    ERIC Educational Resources Information Center

    Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal

    2015-01-01

    Background: Intensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. Aim: To investigate the effects of treatment intensity on outcome…

  8. Tampa Bay International Business Summit Keynote Speech

    NASA Technical Reports Server (NTRS)

    Clary, Christina

    2011-01-01

    A keynote speech outlining the importance of collaboration and diversity in the workplace. The 20-minute speech describes NASA's challenges and accomplishments over the years and what lies ahead. Topics include: diversity and inclusion principles, international cooperation, Kennedy Space Center planning and development, opportunities for cooperation, and NASA's vision for exploration.

  9. CLEFT PALATE. FOUNDATIONS OF SPEECH PATHOLOGY SERIES.

    ERIC Educational Resources Information Center

    RUTHERFORD, DAVID; WESTLAKE, HAROLD

    DESIGNED TO PROVIDE AN ESSENTIAL CORE OF INFORMATION, THIS BOOK TREATS NORMAL AND ABNORMAL DEVELOPMENT, STRUCTURE, AND FUNCTION OF THE LIPS AND PALATE AND THEIR RELATIONSHIPS TO CLEFT LIP AND CLEFT PALATE SPEECH. PROBLEMS OF PERSONAL AND SOCIAL ADJUSTMENT, HEARING, AND SPEECH IN CLEFT LIP OR CLEFT PALATE INDIVIDUALS ARE DISCUSSED. NASAL RESONANCE…

  10. The Need for a Speech Corpus

    ERIC Educational Resources Information Center

    Campbell, Dermot F.; McDonnell, Ciaran; Meinardi, Marti; Richardson, Bunny

    2007-01-01

    This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL/ESL students. In the first part, sections 1-4, the need for improvements in the teaching of listening skills and pronunciation practice for EFL/ESL students is noted. It is argued that the use of authentic native-to-native speech is…

  11. The Effects of TV on Speech Education

    ERIC Educational Resources Information Center

    Gocen, Gokcen; Okur, Alpaslan

    2013-01-01

    Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in a…

  12. Scaffolded-Language Intervention: Speech Production Outcomes

    ERIC Educational Resources Information Center

    Bellon-Harn, Monica L.; Credeur-Pampolina, Maggie E.; LeBoeuf, Lexie

    2013-01-01

    This study investigated the effects of a scaffolded-language intervention using cloze procedures, semantically contingent expansions, contrastive word pairs, and direct models on speech abilities in two preschoolers with speech and language impairment speaking African American English. Effects of the lexical and phonological characteristics (i.e.,…

  13. The Hidden Meaning of Inner Speech.

    ERIC Educational Resources Information Center

    Pomper, Marlene M.

    This paper is concerned with the inner speech process, its relationship to thought and behavior, and its theoretical and educational implications. The paper first defines inner speech as a bridge between thought and written or spoken language and traces its development. Second, it investigates competing theories surrounding the subject with an…

  14. The Neural Substrates of Infant Speech Perception

    ERIC Educational Resources Information Center

    Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

    2014-01-01

    Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

  15. Can Speech Perception be Influenced by Simultaneous

    E-print Network

    exception is Klatt, 1980.) Investigators of visual word perception, too, often Haskins Laboratories Status be penetrable visual influences (as well as the reverse): If earlier stages of auditory word perception can similar effects on speech perception can be elicited by visual input. That visual and auditory speech

  16. GRAPHICAL REPRESENTATION OF PERCEIVED PITCH IN SPEECH.

    ERIC Educational Resources Information Center

    COWAN, J.M.

    AN ARTICLE ON GRAPHICAL REPRESENTATION OF PERCEIVED PITCH IN SPEECH WAS PRESENTED. THIS ARTICLE, A REPRINT FROM THE PROCEEDINGS OF THE FOURTH INTERNATIONAL CONGRESS OF PHONETIC SCIENCES, PROVIDED A GENERAL DESCRIPTION OF THE METHOD AND INSTRUMENTATION OF GRAPHICALLY DESCRIBING SPEECH INTONATIONS, AND PRESENTED A HISTORICAL AND THEORETICAL…

  17. Pitch-Learning Algorithm For Speech Encoders

    NASA Technical Reports Server (NTRS)

    Bhaskar, B. R. Udaya

    1988-01-01

    Adaptive algorithm detects and corrects errors in sequence of estimates of pitch period of speech. Algorithm operates in conjunction with techniques used to estimate pitch period. Used in such parametric and hybrid speech coders as linear predictive coders and adaptive predictive coders.

  18. Preschoolers Benefit from Visually Salient Speech Cues

    ERIC Educational Resources Information Center

    Lalonde, Kaylah; Holt, Rachael Frush

    2015-01-01

    Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…

  19. SPEECH LEVELS IN VARIOUS NOISE ENVIRONMENTS

    EPA Science Inventory

    The goal of this study was to determine average speech levels used by people when conversing in different levels of background noise. The non-laboratory environments where speech was recorded were: high school classrooms, homes, hospitals, department stores, trains and commercial...

  20. The Preparation of Syllables in Speech Production

    ERIC Educational Resources Information Center

    Cholin, Joana; Schiller, Niels O.; Levelt, Willem J. M.

    2004-01-01

    Models of speech production assume that syllables play a functional role in the process of word-form encoding in speech production. In this study, we investigate this claim and specifically provide evidence about the level at which syllables come into play. We report two studies using an "odd-man-out" variant of the "implicit priming paradigm" to…

  1. Speech neglect: A strange educational blind spot

    NASA Astrophysics Data System (ADS)

    Harris, Katherine Safford

    2005-09-01

    Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

  2. Performing speech recognition research with hypercard

    NASA Technical Reports Server (NTRS)

    Shepherd, Chip

    1993-01-01

    The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.

  3. Acoustic characteristics of listener-constrained speech

    NASA Astrophysics Data System (ADS)

    Ashby, Simone; Cummins, Fred

    2003-04-01

    Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

  4. Speech vs. singing: infants choose happier sounds.

    PubMed

    Corbeil, Marieve; Trehub, Sandra E; Peretz, Isabelle

    2013-01-01

    Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

  5. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    PubMed Central

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889

  6. Voice Quality Modelling for Expressive Speech Synthesis

    PubMed Central

    Socoró, Joan Claudi

    2014-01-01

    This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

  7. Predicting the intelligibility of vocoded speech

    PubMed Central

    Chen, Fei; Loizou, Philipos C.

    2010-01-01

    Objectives The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different SNR levels (-5, 0 and 5 dB) and two types of maskers (steady-state noise and two-talker). Tone-vocoder simulations were used as well as simulations of combined electric-acoustic stimulation (EAS). The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech-transmission index (STI) and articulation index (AI) based measures, as well as distortions in hearing aids (e.g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal envelope distortions, such as those based on the speech-transmission index, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r=0.9-0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100 Hz) were used. The performance of these measures remained high in both steady-noise and fluctuating masker conditions. The correlations with conditions involving tone-vocoded speech were found to be a bit higher than the correlations with conditions involving EAS-vocoded speech. Conclusions The present study demonstrated that some of the speech intelligibility indices that have been found previously to correlate highly with wideband speech can also be used to predict the intelligibility of vocoded speech. Both the coherence-based and STI-based measures have been found to be good measures for modeling the intelligibility of vocoded speech. The highest correlation (r=0.96) was obtained with a derived coherence measure that placed more emphasis on information contained in vowel/consonant spectral transitions and less emphasis on information contained in steady sonorant segments. High (100 Hz) modulation rates were found to be necessary in the implementation of the STI-based measures for better modeling of the intelligibility of vocoded speech. We believe that the difference in modulation rates needed for modeling the intelligibility of wideband versus vocoded speech can be attributed to the increased importance of higher modulation rates in situations where the amount of spectral information available to the listeners is limited (8 channels in our study). Unlike the traditional STI method which has been found to perform poorly in terms of predicting the intelligibility of processed speech wherein non-linear operations are involved, the STI-based measure used in the present study has been found to perform quite well. In summary, the present study took the first step in modeling the intelligibility of vocoded speech. Access to such intelligibility measures is of high significance as they can be used to guide the development of new speech coding algorithms for cochlear implants. PMID:21206363

  8. Strategies for distant speech recognitionin reverberant environments

    NASA Astrophysics Data System (ADS)

    Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

    2015-12-01

    Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.

  9. Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech

    NASA Technical Reports Server (NTRS)

    Hieronymus, James; Aist, Greg; Dowding, John

    2006-01-01

    An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.

  10. Speech recognition technology: a critique.

    PubMed Central

    Levinson, S E

    1995-01-01

    This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology. PMID:7479808

  11. The Functional Connectome of Speech Control

    PubMed Central

    Fuertinger, Stefan; Horwitz, Barry; Simonyan, Kristina

    2015-01-01

    In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI) data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs) in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively forged the formation of the functional speech connectome. In addition, the observed capacity of the primary sensorimotor cortex to exhibit operational heterogeneity challenged the established concept of unimodality of this region. PMID:26204475

  12. Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications

    ERIC Educational Resources Information Center

    Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara

    2009-01-01

    Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech

  13. Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion Keigo Nakamura1

    E-print Network

    Duh, Kevin

    Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion Keigo Nakamura1 , Tomoki electrolaryn- geal speech (EL speech) to normal speech. Because valid F0 information cannot be obtained from the EL speech, we have so far converted the EL speech to whispering. This paper conducts the EL speech

  14. An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

    SciTech Connect

    Hogden, J.

    1996-12-31

    Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

  15. Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric

    E-print Network

    Toronto, University of

    interaction with modalities such as standard keyboards. Automatic speech recognition (ASR) can, thereforeComparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech Kinfe articulated and hardly intel- ligible speech. Hence individuals with dysarthria are rarely understood by human

  16. THE FIELD OF SPEECH, ITS PURPOSES AND SCOPE IN EDUCATION.

    ERIC Educational Resources Information Center

    WALLACE, KARL R.; AND OTHERS

    SPEECH PROFESSIONALS, SPECIALIZING IN THE ART AND SCIENCE OF SPEECH BEHAVIOR AND COMMUNICATION, SHARE THE COMMON ASSUMPTIONS THAT (1) SPEECH IS THE MOST SIGNIFICANT AND UNAVOIDABLE OF MAN'S LEARNED BEHAVIOR, (2) SPEECH IS THE HUMANISTIC CENTER FROM WHICH THE SEARCH FOR AND TRANSMISSION OF KNOWLEDGE ABOUT MAN PROPERLY PROCEED, AND (3) EVERY…

  17. Prosodic Features and Speech Naturalness in Individuals with Dysarthria

    ERIC Educational Resources Information Center

    Klopfenstein, Marie I.

    2012-01-01

    Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech

  18. The Genres of "Shouted Speech" in Cheke Holo.

    ERIC Educational Resources Information Center

    Boswell, Freddy

    Speech genres in Cheke Holo (CH) have not been studied extensively. Speech genres related to shouted speech in CH deserves more study because it is commonly used. Culturally speaking, shouted speech is a natural expression of the importance and centrality of CH community and surrounding authority structures, and has a very strong hortatory…

  19. Visual and Auditory Input in Second-Language Speech Processing

    ERIC Educational Resources Information Center

    Hardison, Debra M.

    2010-01-01

    The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…

  20. Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage

    ERIC Educational Resources Information Center

    Peach, Richard K.; Tonkovich, John D.

    2004-01-01

    Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…

  1. Incorporating Women's Speeches as Models in the Basic Course.

    ERIC Educational Resources Information Center

    Jensen, Marvin D.

    Studies indicate that there is a general lack of availability and use of women's speeches in college speech curricula. By incorporating more women's speeches as models, instructors of the basic course in speech can present a more complete picture of American public speaking while also encouraging women in these classes to feel less muted in their…

  2. Language Modeling for Automatic Speech Recognition Meets the Web

    E-print Network

    Cortes, Corinna

    Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian Brants, Vida Ha, Will Neveitt 05/02/2011 Ciprian Chelba et al., Voice Search Language Modeling ­ p. #12;Statistical Modeling in Automatic Speech Recognition Producer Speaker's Speech Speech Recognizer Acoustic

  3. Opportunities for Advanced Speech Processing in Military Computer-Based

    E-print Network

    of speech algorithm techniques including speech coding, speech recognition, and speaker recognition. 1 b/s) and very low-rate (50-1200 b/s) secure voice communication; (2) voice/data integration in computer networks; (3) speech recognition in fighter aircraft, military helicopters, bat- tle management

  4. Speech Characteristics Associated with Three Genotypes of Ataxia

    ERIC Educational Resources Information Center

    Sidtis, John J.; Ahn, Ji Sook; Gomez, Christopher; Sidtis, Diana

    2011-01-01

    Purpose: Advances in neurobiology are providing new opportunities to investigate the neurological systems underlying motor speech control. This study explores the perceptual characteristics of the speech of three genotypes of spino-cerebellar ataxia (SCA) as manifest in four different speech tasks. Methods: Speech samples from 26 speakers with SCA…

  5. Monkey Lipsmacking Develops Like the Human Speech Rhythm

    ERIC Educational Resources Information Center

    Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

    2012-01-01

    Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…

  6. Deep Learning in Speech Synthesis August 31st, 2013

    E-print Network

    Cortes, Corinna

    Deep Learning in Speech Synthesis Heiga Zen Google August 31st, 2013 #12;Outline Background Deep Learning Deep Learning in Speech Synthesis Motivation Deep learning-based approaches DNN-based statistical-to-speech synthesis (TTS) Text (discrete symbol sequence) Speech (continuous time series) Heiga Zen Deep Learning

  7. Freedom of Speech Newsletter, Volume 2, Number 2, November 1975.

    ERIC Educational Resources Information Center

    Allen, Winfred G., Jr., Ed.

    This issue of the Freedom of Speech Newsletter contains a list of the panels presented by the Freedom of Speech Interest Group at the Western Speech Communication Association convention in November 1975; Communication Stress and Freedom of Speech by John L. Healy, which describes an investigation into the destructive effects of stress; A Right to…

  8. Speech Sound Disorders in a Community Study of Preschool Children

    ERIC Educational Resources Information Center

    McLeod, Sharynne; Harrison, Linda J.; McAllister, Lindy; McCormack, Jane

    2013-01-01

    Purpose: To undertake a community (nonclinical) study to describe the speech of preschool children who had been identified by parents/teachers as having difficulties "talking and making speech sounds" and compare the speech characteristics of those who had and had not accessed the services of a speech-language pathologist (SLP). Method:…

  9. HMM-based Speech Synthesis Adapted to Listeners' & Talkers' conditions

    E-print Network

    Edinburgh, University of

    HMM-based Speech Synthesis Adapted to Listeners' & Talkers' conditions Dr Junichi Yamagishi The Centre for Speech Technology Research University of Edinburgh www.cstr.ed.ac.uk Centre for Speech Technology Research R T S C 1 #12;Agenda 1. Text-to-speech synthesis (TTS) - Hidden Markov model (HMM

  10. Speech/Nonspeech Segmentation in Web Videos Ananya Misra

    E-print Network

    Cortes, Corinna

    Speech/Nonspeech Segmentation in Web Videos Ananya Misra Google, New York, NY, USA amisra@google.com Abstract Speech transcription of web videos requires first detecting seg- ments with transcribable speech condi- tions. Index Terms: segmentation, speech detection, voice activity detection, video 1

  11. BCS 561: Speech Perception and Recognition Spring 2006

    E-print Network

    Makous, Walter

    1 BCS 561: Speech Perception and Recognition Spring 2006 Instructor: Richard Aslin Wednesdays 1 on human speech perception and recognition. Topics include an overview of phonetics, categorical perception, speech perception by nonhumans and by human infants, perception of nonnative speech sounds, intermodal

  12. Computational Differences between Whispered and Non-Whispered Speech

    ERIC Educational Resources Information Center

    Lim, Boon Pang

    2011-01-01

    Whispering is a common type of speech which is not often studied in speech technology. Perceptual and physiological studies show us that whispered speech is subtly different from phonated speech, and is surprisingly able to carry a tremendous amount of information. In this dissertation we consider the question: What makes whispering a good form of…

  13. Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks

    E-print Network

    Noble, William Stafford

    Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks Karen Livescu, James features, such as articulatory or other phonological features, for auto- matic speech recognition The majority of current speech recognition research assumes a model of speech consisting of a stream

  14. Using semantic analysis to improve speech recognition performance

    E-print Network

    Erdogan, Hakan

    Using semantic analysis to improve speech recognition performance Hakan Erdogan a,*,1 , Ruhi modeling for speech recognition attempts to model the probability P(W) of observ- ing a word sequence W natural language for speech recognition. The purpose of language modeling is to bias a speech recognizer

  15. HOW EFFECTIVE IS UNSUPERVISED DATA COLLECTION FOR CHILDREN'S SPEECH RECOGNITION?

    E-print Network

    Mostow, Jack

    HOW EFFECTIVE IS UNSUPERVISED DATA COLLECTION FOR CHILDREN'S SPEECH RECOGNITION? G. Aist*, P. Chan a unique challenge to automatic speech recognition. Today's state-of-the-art speech recognition systems transcribed by a speech recognition system and automatically filtered. We studied how to use

  16. Survey of Current Speech Technology Alexander I. Rudnicky

    E-print Network

    Rudnicky, Alexander I.

    Survey of Current Speech Technology Alexander I. Rudnicky Alexander G. Hauptmann School of Computer 4, 1993 This article describes two technologies, speech recognition and speech syn­ thesis­recording of speech impractical. The technologies covered in this article are of particular interest because

  17. Fantasy Play in Preschool Classrooms: Age Differences in Private Speech.

    ERIC Educational Resources Information Center

    Kirby, Kathleen Campano

    Private speech is speech overtly directed to a young child's self and not directly spoken to another listener. Private speech develops differently during fantasy play than constructive play. This study examined age differences in the amount of fantasy play in the preschool classroom and in the amount and type of private speech that occurs during…

  18. The Effectiveness of Clear Speech as a Masker

    ERIC Educational Resources Information Center

    Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

    2010-01-01

    Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

  19. Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech

    NASA Astrophysics Data System (ADS)

    Iuzzini, Jenya

    There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were moderately-highly correlated, even the most highly-related segmental and lexical measures agreed on only 76% of classifications (i.e., to CAS and PD). Finally, VOT analyses revealed that CAS utilized a distinct distribution pattern relative to PD and TD. Discussion frames the current findings within a profile of CAS and provides a validated list of criteria for the differential diagnosis of CAS and PD.

  20. Speech processing using conditional observable maximum likelihood continuity mapping

    DOEpatents

    Hogden, John; Nix, David

    2004-01-13

    A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence of speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.

  1. Speech Clarity Index (?): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    NASA Astrophysics Data System (ADS)

    Kayasith, Prakasith; Theeramunkong, Thanaruk

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (?) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of ? as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that ? is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  2. State-based labelling for a sparse representation of speech and its application to robust speech recognition

    E-print Network

    Virtanen, Tuomas

    this labelling in noise- robust automatic speech recognition. Acoustic time-frequency segments of speech the transcriptions. In the recognition phase, noisy speech is mod- eled by a sparse linear combination of noise was tested in the connected digit recognition task with noisy speech material from the Aurora-2 database

  3. Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech Recognition with Prior

    E-print Network

    Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech.paliwal@griffith.edu.au nakamura@slt.atr.co.jp Abstract We have proposed earlier a noise adaptive speech recognition ap- proach that this method performs better than the previous methods. 1. Introduction Speech recognition has to be carried

  4. My Speech Problem, Your Listening Problem, and My Frustration: The Experience of Living with Childhood Speech Impairment

    ERIC Educational Resources Information Center

    McCormack, Jane; McLeod, Sharynne; McAllister, Lindy; Harrison, Linda J.

    2010-01-01

    Purpose: The purpose of this article was to understand the experience of speech impairment (speech sound disorders) in everyday life as described by children with speech impairment and their communication partners. Method: Interviews were undertaken with 13 preschool children with speech impairment (mild to severe) and 21 significant others…

  5. COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION

    E-print Network

    Garner, Philip N.

    COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY normalization and cochlear implant-like speech processing for microphone array- based speech recognition Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which

  6. Hidden Markov models in automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Wrzoskowicz, Adam

    1993-11-01

    This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.

  7. Hearing impaired speech in noisy classrooms

    NASA Astrophysics Data System (ADS)

    Shahin, Kimary; McKellin, William H.; Jamieson, Janet; Hodgson, Murray; Pichora-Fuller, M. Kathleen

    2005-04-01

    Noisy classrooms have been shown to induce among students patterns of interaction similar to those used by hearing impaired people [W. H. McKellin et al., GURT (2003)]. In this research, the speech of children in a noisy classroom setting was investigated to determine if noisy classrooms have an effect on students' speech. Audio recordings were made of the speech of students during group work in their regular classrooms (grades 1-7), and of the speech of the same students in a sound booth. Noise level readings in the classrooms were also recorded. Each student's noisy and quiet environment speech samples were acoustically analyzed for prosodic and segmental properties (f0, pitch range, pitch variation, phoneme duration, vowel formants), and compared. The analysis showed that the students' speech in the noisy classrooms had characteristics of the speech of hearing-impaired persons [e.g., R. O'Halpin, Clin. Ling. and Phon. 15, 529-550 (2001)]. Some educational implications of our findings were identified. [Work supported by the Peter Wall Institute for Advanced Studies, University of British Columbia.

  8. Gesture–speech integration in narrative

    PubMed Central

    Alibali, Martha W.; Evans, Julia L.; Hostetter, Autumn B.; Ryan, Kristin; Mainela-Arnold, Elina

    2014-01-01

    Speakers sometimes express information in gestures that they do not express in speech. In this research, we developed a system that could be used to assess the redundancy of gesture and speech in a narrative task. We then applied this system to examine whether children and adults produce non-redundant gesture–speech combinations at similar rates. The coding system was developed based on a sample of 30 children. A crucial feature of the system is that gesture meanings can be assessed based on form alone; thus, the meanings speakers express in gesture and speech can be assessed independently and compared. We then collected narrative data from a new sample of 17 children (ages 5–10), as well as a sample of 20 adults, and we determined the average proportion of non-redundant gesture–speech combinations produced by individuals in each group. Children produced more non-redundant gesture–speech combinations than adults, both at the clause level and at the word level. These findings suggest that gesture–speech integration is not constant over the life span, but instead appears to change with development.

  9. Emotion recognition from speech: tools and challenges

    NASA Astrophysics Data System (ADS)

    Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.

    2015-05-01

    Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.

  10. The sensorimotor and social sides of the architecture of speech.

    PubMed

    Pezzulo, Giovanni; Barca, Laura; D'Ausilio, Alessando

    2014-12-01

    Speech is a complex skill to master. In addition to sophisticated phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps that couple heard speech sounds with motor programs for speech production; imitation and self-imitation mechanisms that can train the sensorimotor maps to reproduce heard speech sounds; and a "pedagogical" learning environment that supports tutor learning. PMID:25514959

  11. Speech synthesis by glottal excited linear prediction.

    PubMed

    Childers, D G; Hu, H T

    1994-10-01

    This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation. The coefficients of the polynomial model form a vector that represents the glottal excitation waveform for one pitch period. A glottal excitation code book with 32 entries for voiced excitation is designed and trained using two sentences spoken by different speakers. The purpose for using this approach is to demonstrate that quantization of the glottal excitation waveform does not significantly degrade the quality of speech synthesized with a glottal excitation linear predictive (GELP) synthesizer. This implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speed sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. It is conjectured that the glottal excitation codebook approach could provide a mechanism for quantitatively comparing the differences in glottal excitation codebooks for male and female speakers and for speakers with vocal disorders and for speakers with different voice types such as breathy and vocal fry voices. Conceivably, one could also convert the voice of a speaker with one voice type, e.g., breathy, to the voice of a speaker with another voice type, e.g., vocal fry, by synthesizing speech using the vocal tract LP parameters for the speaker with the breathy voice excited by the glottal excitation codebook trained for vocal fry. PMID:7963019

  12. Analysis, recognition, and interpretation of speech signals

    NASA Astrophysics Data System (ADS)

    Vintziuk, Taras Klimovich

    The problems of the machine analysis, recognition, semantic interpretation, synthesis, and compressed speech transmission are examined with reference to oral man-machine dialogue in formalized and natural languages for applications in data collection, processing, and control systems. Methods for the recognition of individual words and continuous speech, signal segmentation and self-segmentation, speech recognition learning, recognition of the voice of a particular operator, recognition of multiple speakers, and selection of signal matching and signal analysis techniques are discussed from a unified standpoint based on the use of dynamic programming.

  13. Characteristic Extraction of Speech Signal Using Wavelet

    NASA Astrophysics Data System (ADS)

    Moriai, Shogo; Hanazaki, Izumi

    In the analysis-synthesis coding of speech signals, realization of the high quality in the low bit rate coding depends on the extraction of its characteristic parameters in the pre-processing. The precise extraction of the fundamental frequency, one of the parameters of the source information, guarantees the quality in the speech synthesis. But its extraction is diffcult because of the influence of the consonant, non-periodicity of vocal cords vibration, wide range of the fundamental frequency, etc.. In this paper, we will propose a new fundamental frequency extraction of the speech signals using the Wavelet transform with the criterion based on its harmonics structure.

  14. Vector Adaptive/Predictive Encoding Of Speech

    NASA Technical Reports Server (NTRS)

    Chen, Juin-Hwey; Gersho, Allen

    1989-01-01

    Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.

  15. Acoustic differences among casual, conversational, and read speech

    NASA Astrophysics Data System (ADS)

    Pinnow, DeAnna

    Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

  16. Children use visual speech to compensate for non-intact auditory speech.

    PubMed

    Jerger, Susan; Damian, Markus F; Tye-Murray, Nancy; Abdi, Hervé

    2014-10-01

    We investigated whether visual speech fills in non-intact auditory speech (excised consonant onsets) in typically developing children from 4 to 14 years of age. Stimuli with the excised auditory onsets were presented in the audiovisual (AV) and auditory-only (AO) modes. A visual speech fill-in effect occurs when listeners experience hearing the same non-intact auditory stimulus (e.g., /-b/ag) as different depending on the presence/absence of visual speech such as hearing /bag/ in the AV mode but hearing /ag/ in the AO mode. We quantified the visual speech fill-in effect by the difference in the number of correct consonant onset responses between the modes. We found that easy visual speech cues /b/ provided greater filling in than difficult cues /g/. Only older children benefited from difficult visual speech cues, whereas all children benefited from easy visual speech cues, although 4- and 5-year-olds did not benefit as much as older children. To explore task demands, we compared results on our new task with those on the McGurk task. The influence of visual speech was uniquely associated with age and vocabulary abilities for the visual speech fill--in effect but was uniquely associated with speechreading skills for the McGurk effect. This dissociation implies that visual speech--as processed by children-is a complicated and multifaceted phenomenon underpinned by heterogeneous abilities. These results emphasize that children perceive a speaker's utterance rather than the auditory stimulus per se. In children, as in adults, there is more to speech perception than meets the ear. PMID:24974346

  17. Speech Perception and Short Term Memory Deficits in Persistent Developmental Speech Disorder

    PubMed Central

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2008-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech perception and short-term memory. Nine adults with a persistent familial developmental speech disorder without language impairment were compared with 20 controls on tasks requiring the discrimination of fine acoustic cues for word identification and on measures of verbal and nonverbal short-term memory. Significant group differences were found in the slopes of the discrimination curves for first formant transitions for word identification with stop gaps of 40 and 20 ms with effect sizes of 1.60 and 1.56. Significant group differences also occurred on tests of nonverbal rhythm and tonal memory, and verbal short-term memory with effect sizes of 2.38, 1.56 and 1.73. No group differences occurred in the use of stop gap durations for word identification. Because frequency-based speech perception and short-term verbal and nonverbal memory deficits both persisted into adulthood in the speech-impaired adults, these deficits may be involved in the persistence of speech disorders without language impairment. PMID:15896836

  18. Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production.

    PubMed

    Liu, Xiaoluan; Xu, Yi

    2015-01-01

    This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252

  19. Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production

    PubMed Central

    Liu, Xiaoluan; Xu, Yi

    2015-01-01

    This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252

  20. Speech information retrieval: a review

    SciTech Connect

    Hafen, Ryan P.; Henry, Michael J.

    2012-11-01

    Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

  1. Vocal Attractiveness Of Statistical Speech Synthesisers 

    E-print Network

    Andraszewicz, Sandra; Yamagishi, Junichi; King, Simon

    2011-01-01

    Our previous analysis of speaker-adaptive HMM-based speech synthesis methods suggested that there are two possible reasons why average voices can obtain higher subjective scores than any individual adapted voice: 1) model ...

  2. The evolution of speech: vision, rhythm, cooperation

    PubMed Central

    Ghazanfar, Asif A.; Takahashi, Daniel Y.

    2014-01-01

    A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes and monkeys recognize the correspondence between vocalizations and the associated facial postures and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity, yet they lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that facial expressions like lip-smacking may be an ancestral expression that was later linked to vocal output in order to produce rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis. PMID:25048821

  3. Linear dynamic models for automatic speech recognition 

    E-print Network

    Frankel, Joe

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models (HMM), in which the output distribution associated with each state is modelled by a mixture of diagonal covariance Gaussians. Dynamic ...

  4. Dialog act modelling for conversational speech 

    E-print Network

    Stolcke, Andreas; Shriberg, Elizabeth; Bates, Rebecca; Coccaro, Noah; Jurafsky, Daniel; Martin, Rachel; Meteer, Marie; Ries, Klaus; Taylor, Paul; Van Ess-Dykema, Carol

    1998-01-01

    We describe an integrated approach for statistical modeling of discourse structure for natural conversational speech. Our model is based on 42 'dialog acts’ (e.g., Statement, Question, Backchannel, Agreement, Disagreement, ...

  5. CHATR: A generic speech synthesis system 

    E-print Network

    Black, Alan W; Taylor, Paul A

    1994-01-01

    This paper describes a generic speech synthesis system called CHATR which is being developed at ATR. CHATR is designed in a modular way so that module parameters and even which modules are actually used may be set and ...

  6. Articulatory Evidence for Interactivity in Speech Production 

    E-print Network

    McMillan, Corey

    2009-01-01

    Traditionally, psychologists and linguists have assumed that phonological speech errors result from the substitution of well-formed segments. However, there is growing evidence from acoustic and articulatory analyses of ...

  7. Speech recognition using linear dynamic models. 

    E-print Network

    Frankel, Joe; King, Simon

    2006-01-01

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models ...

  8. Noise-immune multisensor transduction of speech

    NASA Astrophysics Data System (ADS)

    Viswanathan, Vishu R.; Henry, Claudia M.; Derr, Alan G.; Roucos, Salim; Schwartz, Richard M.

    1986-08-01

    Two types of configurations of multiple sensors were developed, tested and evaluated in speech recognition application for robust performance in high levels of acoustic background noise: One type combines the individual sensor signals to provide a single speech signal input, and the other provides several parallel inputs. For single-input systems, several configurations of multiple sensors were developed and tested. Results from formal speech intelligibility and quality tests in simulated fighter aircraft cockpit noise show that each of the two-sensor configurations tested outperforms the constituent individual sensors in high noise. Also presented are results comparing the performance of two-sensor configurations and individual sensors in speaker-dependent, isolated-word speech recognition tests performed using a commercial recognizer (Verbex 4000) in simulated fighter aircraft cockpit noise.

  9. Making speech recognition work on the web

    E-print Network

    Varenhorst, Christopher J

    2011-01-01

    We present an improved Audio Controller for Web-Accessible Multimodal Interface toolkit -- a system that provides a simple way for developers to add speech recognition to web pages. Our improved system offers increased ...

  10. Perceptual Evaluation of Video-Realistic Speech

    E-print Network

    Geiger, Gadi

    2003-02-28

    abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic ...

  11. A classification based approach to speech segregation.

    PubMed

    Han, Kun; Wang, DeLiang

    2012-11-01

    A key problem in computational auditory scene analysis (CASA) is monaural speech segregation, which has proven to be very challenging. For monaural mixtures, one can only utilize the intrinsic properties of speech or interference to segregate target speech from background noise. Ideal binary mask (IBM) has been proposed as a main goal of sound segregation in CASA and has led to substantial improvements of human speech intelligibility in noise. This study proposes a classification approach to estimate the IBM and employs support vector machines to classify time-frequency units as either target- or interference-dominant. A re-thresholding method is incorporated to improve classification results and maximize hit minus false alarm rates. An auditory segmentation stage is utilized to further improve estimated masks. Systematic evaluations show that the proposed approach produces high quality estimated IBMs and outperforms a recent system in terms of classification accuracy. PMID:23145627

  12. Phonology impacts segmentation in online speech processing 

    E-print Network

    Onnis, Luca; Monaghan, Padraic; Richmond, Korin; Chater, Nick

    2005-01-01

    Peña, Bonatti, Nespor and Mehler(2002) investigated an artificial language where the structure of words was determined by nonadjacent dependencies between syllables. They found that segmentation of continuous speech could ...

  13. Speech therapy and voice recognition instrument

    NASA Technical Reports Server (NTRS)

    Cohen, J.; Babcock, M. L.

    1972-01-01

    Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

  14. Differential oscillatory encoding of foreign speech.

    PubMed

    Pérez, Alejandro; Carreiras, Manuel; Gillon Dowens, Margaret; Duñabeitia, Jon Andoni

    2015-08-01

    Neuronal oscillations play a key role in auditory perception of verbal input, with the oscillatory rhythms of the brain showing synchronization with specific frequencies of speech. Here we investigated the neural oscillatory patterns associated with perceiving native, foreign, and unknown speech. Spectral power and phase synchronization were compared to those of a silent context. Power synchronization to native speech was found in frequency ranges corresponding to the theta band, while no synchronization patterns were found for the foreign speech context and the unknown language context. For phase synchrony, the native and unknown languages showed higher synchronization in the theta-band than the foreign language when compared to the silent condition. These results suggest that neural synchronization patterns are markedly different for native and foreign languages. PMID:26070104

  15. Childhood Apraxia of Speech Family Start Guide

    MedlinePLUS

    ... and diagnosis is a licensed speech-language pathologist (SLP). Other professionals can be helpful and necessary at ... With CAS? It is not often possible for SLP’s to provide a differential diagnosis for a child ...

  16. Post-processing speech recordings during MRI

    E-print Network

    Kuortti, Juha

    2015-01-01

    We discuss post-processing of speech that has been recorded during Magnetic Resonance Imaging (MRI) of the vocal tract. Such speech recordings are contaminated by high levels of acoustic noise from the MRI scanner. Also, the frequency response of the sound signal path is not flat as a result of severe restrictions on recording instrumentation due to MRI technology. The post-processing algorithm for noise reduction is based on adaptive spectral filtering. The speech material consists of samples of prolonged vowel productions that are used for validation of the post-processing algorithm. The comparison data is recorded in anechoic chamber from the same test subject. Formant analysis is carried out for the post-processed speech and the comparison data. Artificially noise-contaminated vowel samples are used for validation experiments to determine performance of the algorithm where using true data would be difficult. The properties of recording instrumentation or the post-processing algorithm do not explain the co...

  17. Automatic Head Motion Prediction from Speech Data 

    E-print Network

    Hofer, Gregor; Shimodaira, Hiroshi

    2007-01-01

    In this paper we present a novel approach to generate a sequence of head motion units given some speech. The modelling approach is based on the notion that head motion can be divided into a number of short homogeneous ...

  18. Speech Recognition Using Augmented Conditional Random Fields 

    E-print Network

    Hifny, Yasser; Renals, Steve

    2009-01-01

    Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the ...

  19. Speech recognition using linear dynamic models. 

    E-print Network

    Frankel, Joe; King, Simon

    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with subphone states. This approach, whilst successful, models ...

  20. Sparse gaussian graphical models for speech recognition. 

    E-print Network

    Bell, Peter; King, Simon

    2007-01-01

    We address the problem of learning the structure of Gaussian graphical models for use in automatic speech recognition, a means of controlling the form of the inverse covariance matrices of such systems. With particular focus on data sparsity issues...

  1. Full Covariance Modelling for Speech Recognition 

    E-print Network

    Bell, Peter

    2010-01-01

    HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for ...

  2. Connectionist probability estimators in HMM speech recognition 

    E-print Network

    Renals, Steve; Morgan, Nelson; Bourlard, Herve; Cohen, Michael; Franco, Horacio

    The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review...

  3. Speech Recognition Via Phonetically Featured Syllables 

    E-print Network

    King, Simon; Stephenson, Todd; Isard, Stephen; Taylor, Paul; Strachan, Alex

    in speech recognition. We also propose to model this description at the syllable rather than phone level. The ultimate goal of this work is to generate syllable models whose parameters explicitly describe the trajectories of the phonetic features...

  4. Unstable connectionist networks in speech recognition 

    E-print Network

    Rohwer, Richard; Renals, Steve; Terry, Mark

    Connectionist networks evolve in time according to a prescribed rule. Typically, they are designed to be stable so that their temporal activity ceases after a short transient period. However, meaningful patterns in speech have a temporal component...

  5. Pronunciation learning for automatic speech recognition

    E-print Network

    Badr, Ibrahim

    2011-01-01

    In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers (ASRs). Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform ...

  6. Funeral Picketing Laws and Free Speech

    E-print Network

    McAllister, Stephen R.

    2007-04-01

    . 57. ld. at 21. 58. ld. 2007] FUNERAL PICKETING LAWS AND FREE SPEECH 589 once their own."S9 For most people, protecting privacy at funerals and memorial services likely would seem to be a strong governmental interest, maybe even close to as strong... ofprotected speech and the strong interest in protecting privacy in the home, and combining those with the recognized potential for captive audiences and the likely recognition that ensuring privacy at funerals is a substantial governmental interest, a very...

  7. Towards Quranic reader controlled by speech

    E-print Network

    Yekache, Yacine; Kouninef, Belkacem

    2012-01-01

    In this paper we describe the process of designing a task-oriented continuous speech recognition system for Arabic, based on CMU Sphinx4, to be used in the voice interface of Quranic reader. The concept of the Quranic reader controlled by speech is presented, the collection of the corpus and creation of acoustic model are described in detail taking into account a specificities of Arabic language and the desired application.

  8. Speech earthquakes: scaling and universality in human voice

    E-print Network

    Luque, Jordi; Lacasa, Lucas

    2014-01-01

    Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such earthquakes in speech show temporal correlations, as the interevent statistics are again power-law distributed. Since this feature takes place in the intra-phoneme range, we conjecture that the responsible for this complex phenomenon is not cognitive, but it resides on the physiological speech production mechanism. Moreover, we show that these waiting time distributions are scale invariant under a renormalisation group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech ...

  9. Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.

    PubMed

    Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex

    2015-07-01

    Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P?speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results. PMID:26114520

  10. Neural restoration of degraded audiovisual speech.

    PubMed

    Shahin, Antoine J; Kerlin, Jess R; Bhat, Jyoti; Miller, Lee M

    2012-03-01

    When speech is interrupted by noise, listeners often perceptually "fill-in" the degraded signal, giving an illusion of continuity and improving intelligibility. This phenomenon involves a neural process in which the auditory cortex (AC) response to onsets and offsets of acoustic interruptions is suppressed. Since meaningful visual cues behaviorally enhance this illusory filling-in, we hypothesized that during the illusion, lip movements congruent with acoustic speech should elicit a weaker AC response to interruptions relative to static (no movements) or incongruent visual speech. AC response to interruptions was measured as the power and inter-trial phase consistency of the auditory evoked theta band (4-8 Hz) activity of the electroencephalogram (EEG) and the N1 and P2 auditory evoked potentials (AEPs). A reduction in the N1 and P2 amplitudes and in theta phase-consistency reflected the perceptual illusion at the onset and/or offset of interruptions regardless of visual condition. These results suggest that the brain engages filling-in mechanisms throughout the interruption, which repairs degraded speech lasting up to ~250 ms following the onset of the degradation. Behaviorally, participants perceived speech continuity over longer interruptions for congruent compared to incongruent or static audiovisual streams. However, this specific behavioral profile was not mirrored in the neural markers of interest. We conclude that lip-reading enhances illusory perception of degraded speech not by altering the quality of the AC response, but by delaying it during degradations so that longer interruptions can be tolerated. PMID:22178454

  11. Effects of human fatigue on speech signals

    NASA Astrophysics Data System (ADS)

    Stamoulis, Catherine

    2001-05-01

    Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

  12. Inner speech deficits in people with aphasia

    PubMed Central

    Langland-Hassan, Peter; Faries, Frank R.; Richardson, Michael J.; Dietz, Aimee

    2015-01-01

    Despite the ubiquity of inner speech in our mental lives, methods for objectively assessing inner speech capacities remain underdeveloped. The most common means of assessing inner speech is to present participants with tasks requiring them to silently judge whether two words rhyme. We developed a version of this task to assess the inner speech of a population of patients with aphasia and corresponding language production deficits. Patients’ performance on the silent rhyming task was severely impaired relative to controls. Patients’ performance on this task did not, however, correlate with their performance on a variety of other standard tests of overt language and rhyming abilities. In particular, patients who were generally unimpaired in their abilities to overtly name objects during confrontation naming tasks, and who could reliably judge when two words spoken to them rhymed, were still severely impaired (relative to controls) at completing the silent rhyme task. A variety of explanations for these results are considered, as a means to critically reflecting on the relations among inner speech, outer speech, and silent rhyme judgments more generally. PMID:25999876

  13. Fifty years of progress in speech synthesis

    NASA Astrophysics Data System (ADS)

    Schroeter, Juergen

    2004-10-01

    A common opinion is that progress in speech synthesis should be easier to discern than in other areas of speech communication: you just have to listen to the speech! Unfortunately, things are more complicated. It can be said, however, that early speech synthesis efforts were primarily concerned with providing intelligible speech, while, more recently, ``naturalness'' has been the focus. The field had its ``electronic'' roots in Homer Dudley's 1939 ``Voder,'' and it advanced in the 1950s and 1960s through progress in a number of labs including JSRU in England, Haskins Labs in the U.S., and Fant's Lab in Sweden. In the 1970s and 1980s significant progress came from efforts at Bell Labs (under Jim Flanagan's leadership) and at MIT (where Dennis Klatt created one of the first commercially viable systems). Finally, over the past 15 years, the methods of unit-selection synthesis were devised, primarily at ATR in Japan, and were advanced by work at AT&T Labs, Univ. of Edinburgh, and ATR. Today, TTS systems are able to ``convince some of the listeners some of the time'' that synthetic speech is as natural as live recordings. Ongoing efforts aim at replacing ``some'' with ``most'' for a wide range of real-world applications.

  14. When Spectral Smearing Can Increase Speech Intelligibility

    PubMed Central

    Bashford, J.A.; Warren, R.M.; Lenz, P.W.

    2013-01-01

    Sentences were reduced to an array of sixteen effectively rectangular bands (RBs) having center frequencies ranging from 0.25 to 8 kHz spaced at ?-octave intervals. Four arrays were employed, each having uniform subcritical bandwidths which ranged from 40 Hz to 5 Hz. The 40 Hz width array had intelligibility near ceiling, and the 5 Hz array about 1%. The finding of interest was that when the subcritical speech RBs were used to modulate RBs of noise having the same center frequency as the speech but having bandwidths increased to a critical (ERBn) bandwidth at each center frequency, these spectrally smeared arrays were considerably more intelligible in all but the 40 Hz (ceiling) condition. For example, when the 10 Hz bandwidth speech array having an intelligibility of 8% modulated the ERBn noise array, intelligibility increased to 48%. This six-fold increase occurred despite elimination of spectral fine structure and addition of stochastic fluctuation to speech envelope cues. (As anticipated, conventional vocoding with matching bandwidths of speech and noise reduced the 10-Hz-speech array intelligibility from 8% to 1%). These effects of smearing confirm findings by Bashford, Warren, and Lenz (2010) that optimal temporal processing requires stimulation of a critical bandwidth. [Supported by NIH] PMID:23991247

  15. Underwater speech communications with a modulated laser

    NASA Astrophysics Data System (ADS)

    Woodward, B.; Sari, H.

    2008-04-01

    A novel speech communications system using a modulated laser beam has been developed for short-range applications in which high directionality is an exploitable feature. Although it was designed for certain underwater applications, such as speech communications between divers or between a diver and the surface, it may equally be used for air applications. With some modification it could be used for secure diver-to-diver communications in the situation where untethered divers are swimming close together and do not want their conversations monitored by intruders. Unlike underwater acoustic communications, where the transmitted speech may be received at ranges of hundreds of metres omnidirectionally, a laser communication link is very difficult to intercept and also obviates the need for cables that become snagged or broken. Further applications include the transmission of speech and data, including the short message service (SMS), from a fixed installation such as a sea-bed habitat; and data transmission to and from an autonomous underwater vehicle (AUV), particularly during docking manoeuvres. The performance of the system has been assessed subjectively by listening tests, which revealed that the speech was intelligible, although of poor quality due to the speech algorithm used.

  16. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ...false Condition of participation: Speech pathology services. 485.715 ...of Outpatient Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech...

  17. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ...false Condition of participation: Speech pathology services. 485.715 ...of Outpatient Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech...

  18. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...false Condition of participation: Speech pathology services. 485.715 ...of Outpatient Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech...

  19. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...false Condition of participation: Speech pathology services. 485.715 ...of Outpatient Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech...

  20. 42 CFR 485.715 - Condition of participation: Speech pathology services.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ...false Condition of participation: Speech pathology services. 485.715 ...of Outpatient Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech...

  1. Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization

    E-print Network

    Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization Kuldip K cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding speech recognition (ASR) technology in the context of mobile communication systems. Speech recognition

  2. Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter

    ERIC Educational Resources Information Center

    Davidow, Jason H.

    2014-01-01

    Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

  3. The varieties of inner speech: links between quality of inner speech and psychopathological variables in a sample of young adults.

    PubMed

    McCarthy-Jones, Simon; Fernyhough, Charles

    2011-12-01

    A resurgence of interest in inner speech as a core feature of human experience has not yet coincided with methodological progress in the empirical study of the phenomenon. The present article reports the development and psychometric validation of a novel instrument, the Varieties of Inner Speech Questionnaire (VISQ), designed to assess the phenomenological properties of inner speech along dimensions of dialogicality, condensed/expanded quality, evaluative/motivational nature, and the extent to which inner speech incorporates other people's voices. In response to findings that some forms of psychopathology may relate to inner speech, anxiety, depression, and proneness to auditory and visual hallucinations were also assessed. Anxiety, but not depression, was found to be uniquely positively related to both evaluative/motivational inner speech and the presence of other voices in inner speech. Only dialogic inner speech predicted auditory hallucination-proneness, with no inner speech variables predicting levels of visual hallucinations/disturbances. Directions for future research are discussed. PMID:21880511

  4. Preschool speech intelligibility and vocabulary skills predict long-term speech and language outcomes following cochlear implantation in early childhood.

    PubMed

    Castellanos, Irina; Kronenberger, William G; Beer, Jessica; Henning, Shirley C; Colson, Bethany G; Pisoni, David B

    2014-07-01

    Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants (CIs), but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine whether early preschool measures of speech and language performance predict speech-language functioning in long-term users of CIs. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3-6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes. PMID:23998347

  5. Listen up! Speech is for thinking during infancy

    PubMed Central

    Vouloumanos, Athena; Waxman, Sandra R.

    2015-01-01

    Infants’ exposure to human speech within the first year of life promotes more than speech processing and language acquisition: new developmental evidence suggests that listening to speech shapes infants’ fundamental cognitive and social capacities. Speech streamlines infants’ learning, promotes the formation of object categories, signals communicative partners, highlights information in social interactions, and offers insight into the minds of others. These results, which challenge the claim that for infants, speech offers no special cognitive advantages, suggests a new synthesis: Far earlier than researchers had imagined, an intimate and powerful connection between human speech and cognition guides infant development, advancing infants’ acquisition of fundamental psychological processes. PMID:25457376

  6. Speech processing based on short-time Fourier analysis

    SciTech Connect

    Portnoff, M.R.

    1981-06-02

    Short-time Fourier analysis (STFA) is a mathematical technique that represents nonstationary signals, such as speech, music, and seismic signals in terms of time-varying spectra. This representation provides a formalism for such intuitive notions as time-varying frequency components and pitch contours. Consequently, STFA is useful for speech analysis and speech processing. This paper shows that STFA provides a convenient technique for estimating and modifying certain perceptual parameters of speech. As an example of an application of STFA of speech, the problem of time-compression or expansion of speech, while preserving pitch and time-varying frequency content is presented.

  7. Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

    E-print Network

    Theune, Mariët

    reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript reports on the setup and evaluation of the speech recognition system (further referred to as SHoUT system3-technical Report, version 1.0, May 2007 Marijn Huijbregts, Roeland Ordelman, Franciska de Jong Human Media

  8. Using Human Speech StructuresUsing Human Speech Structures to Model Reality:to Model Reality

    E-print Network

    McKay, Robert Ian

    of Distribution Algorithms,Estimation of Distribution Algorithms, Grammars and Genetic Programming1 Using Human Speech StructuresUsing Human Speech Structures to Model Reality:to Model Reality: Grammars in GeneticGrammars in Genetic ProgrammingProgramming Bob McKayBob McKay School of Information

  9. Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2009

    2009-01-01

    Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…

  10. Prisoner Fasting as Symbolic Speech: The Ultimate Speech-Action Test.

    ERIC Educational Resources Information Center

    Sneed, Don; Stonecipher, Harry W.

    The ultimate test of the speech-action dichotomy, as it relates to symbolic speech to be considered by the courts, may be the fasting of prison inmates who use hunger strikes to protest the conditions of their confinement or to make political statements. While hunger strikes have been utilized by prisoners for years as a means of protest, it was…

  11. Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech

    ERIC Educational Resources Information Center

    Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

    2012-01-01

    Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…

  12. A Self-Transcribing Speech Corpus: Collecting Continuous Speech with an Online Educational Game

    E-print Network

    educational game called Voice Scatter. Voice Scatter uses speech recognition to provide a fun way for users through the use of an online educational game called Voice Scatter, in which players study flashcards 27.63 hours of speech, collected during the first 22 days that Voice Scat- ter was publicly available

  13. An effective cluster-based model for robust speech detection and speech recognition in noisy environments.

    PubMed

    Górriz, J M; Ramírez, J; Segura, J C; Puntonet, C G

    2006-07-01

    This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms. PMID:16875243

  14. Modeling Speech Disfluency to Predict Conceptual Misalignment in Speech Survey Interfaces

    ERIC Educational Resources Information Center

    Ehlen, Patrick; Schober, Michael F.; Conrad, Frederick G.

    2007-01-01

    Computer-based interviewing systems could use models of respondent disfluency behaviors to predict a need for clarification of terms in survey questions. This study compares simulated speech interfaces that use two such models--a generic model and a stereotyped model that distinguishes between the speech of younger and older speakers--to several…

  15. Spotlight on Speech Codes 2010: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2010

    2010-01-01

    Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…

  16. Dramatic Effects of Speech Task on Motor and Linguistic Planning in Severely Dysfluent Parkinsonian Speech

    ERIC Educational Resources Information Center

    Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

    2012-01-01

    In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…

  17. Running Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing

    E-print Network

    Massaro, Dominic

    Running Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing Dominic the development and testing of quantitative models. To meet this goal, we are establishing a data warehouse warehouse is to initiate the accumulation and dissemination of experimental results and formal model testing

  18. Speech Is Speech, and Prose Is Prose, And (N)ever the Twain....

    ERIC Educational Resources Information Center

    Minkoff, Harvey

    Although speech and writing both contain functional varieties as well as many similar mechanical aspects, mature writing contains a number of conventions (words, idioms, constructions) rarely found in mainstream native speech. Among areas of contrast are vocabulary, syntactic constructions--especially punctuation--and the more complex use of…

  19. The Clinical Practice of Speech and Language Therapists with Children with Phonologically Based Speech Sound Disorders

    ERIC Educational Resources Information Center

    Oliveira, Carla; Lousada, Marisa; Jesus, Luis M. T.

    2015-01-01

    Children with speech sound disorders (SSD) represent a large number of speech and language therapists' caseloads. The intervention with children who have SSD can involve different therapy approaches, and these may be articulatory or phonologically based. Some international studies reveal a widespread application of articulatory based approaches in…

  20. The Use of Interpreters by Speech-Language Pathologists Conducting Bilingual Speech-Language Assessments

    ERIC Educational Resources Information Center

    Palfrey, Carol Lynn

    2013-01-01

    The purpose of this non-experimental quantitative study was to explore the practices of speech-language pathologists in conducting bilingual assessments with interpreters. Data were obtained regarding the assessment tools and practices used by speech-language pathologists, the frequency with which they work with interpreters, and the procedures…

  1. A Motor Speech Assessment for Children with Severe Speech Disorders: Reliability and Validity Evidence

    ERIC Educational Resources Information Center

    Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.

    2013-01-01

    Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…

  2. Speech-to-Speech Translation Activities in Thailand Chai Wutiwiwatchai, Thepchai Supnithi, Krit Kosawat

    E-print Network

    Speech-to-Speech Translation Activities in Thailand Chai Wutiwiwatchai, Thepchai Supnithi, Krit Pahonyothin Rd., Klong-luang, Pathumthani 12120 Thailand {chai.wut, thepchai.sup, krit Technology labora- tory at the National Electronics and Com- puter Technology Center (NECTEC) in Thailand

  3. Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

    ERIC Educational Resources Information Center

    Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

    2014-01-01

    Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…

  4. Speech Intelligibility and Accents in Speech-Mediated Interfaces: Results and Recommendations

    ERIC Educational Resources Information Center

    Lawrence, Halcyon M.

    2013-01-01

    There continues to be significant growth in the development and use of speech--mediated devices and technology products; however, there is no evidence that non-native English speech is used in these devices, despite the fact that English is now spoken by more non-native speakers than native speakers, worldwide. This relative absence of nonnative…

  5. Spotlight on Speech Codes 2011: The State of Free Speech on Our Nation's Campuses

    ERIC Educational Resources Information Center

    Foundation for Individual Rights in Education (NJ1), 2011

    2011-01-01

    Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and accompanying report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…

  6. A Clinician Survey of Speech and Non-Speech Characteristics of Neurogenic Stuttering

    ERIC Educational Resources Information Center

    Theys, Catherine; van Wieringen, Astrid; De Nil, Luc F.

    2008-01-01

    This study presents survey data on 58 Dutch-speaking patients with neurogenic stuttering following various neurological injuries. Stroke was the most prevalent cause of stuttering in our patients, followed by traumatic brain injury, neurodegenerative diseases, and other causes. Speech and non-speech characteristics were analyzed separately for…

  7. Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications

    NASA Astrophysics Data System (ADS)

    Liberman, A. M.

    1982-03-01

    This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

  8. Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception

    ERIC Educational Resources Information Center

    Stekelenburg, Jeroen J.; Vroomen, Jean

    2012-01-01

    We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…

  9. Speech evaluation in children with temporomandibular disorders

    PubMed Central

    PIZOLATO, Raquel Aparecida; FERNANDES, Frederico Silva de Freitas; GAVIÃO, Maria Beatriz Duarte

    2011-01-01

    Objectives The aims of this study were to evaluate the influence of temporomandibular disorders (TMD) on speech in children, and to verify the influence of occlusal characteristics. Material and methods Speech and dental occlusal characteristics were assessed in 152 Brazilian children (78 boys and 74 girls), aged 8 to 12 (mean age 10.05 ± 1.39 years) with or without TMD signs and symptoms. The clinical signs were evaluated using the Research Diagnostic Criteria for TMD (RDC/TMD) (axis I) and the symptoms were evaluated using a questionnaire. The following groups were formed: Group TMD (n=40), TMD signs and symptoms (Group S and S, n=68), TMD signs or symptoms (Group S or S, n=33), and without signs and symptoms (Group N, n=11). Articulatory speech disorders were diagnosed during spontaneous speech and repetition of the words using the "Phonological Assessment of Child Speech" for the Portuguese language. It was also applied a list of 40 phonological balanced words, read by the speech pathologist and repeated by the children. Data were analyzed by descriptive statistics, Fisher's exact or Chi-square tests (?=0.05). Results A slight prevalence of articulatory disturbances, such as substitutions, omissions and distortions of the sibilants /s/ and /z/, and no deviations in jaw lateral movements were observed. Reduction of vertical amplitude was found in 10 children, the prevalence being greater in TMD signs and symptoms children than in the normal children. The tongue protrusion in phonemes /t/, /d/, /n/, /l/ and frontal lips in phonemes /s/ and /z/ were the most prevalent visual alterations. There was a high percentage of dental occlusal alterations. Conclusions There was no association between TMD and speech disorders. Occlusal alterations may be factors of influence, allowing distortions and frontal lisp in phonemes /s/ and /z/ and inadequate tongue position in phonemes /t/; /d/; /n/; /l/. PMID:21986655

  10. The logic of indirect speech.

    PubMed

    Pinker, Steven; Nowak, Martin A; Lee, James J

    2008-01-22

    When people speak, they often insinuate their intent indirectly rather than stating it as a bald proposition. Examples include sexual come-ons, veiled threats, polite requests, and concealed bribes. We propose a three-part theory of indirect speech, based on the idea that human communication involves a mixture of cooperation and conflict. First, indirect requests allow for plausible deniability, in which a cooperative listener can accept the request, but an uncooperative one cannot react adversarially to it. This intuition is supported by a game-theoretic model that predicts the costs and benefits to a speaker of direct and indirect requests. Second, language has two functions: to convey information and to negotiate the type of relationship holding between speaker and hearer (in particular, dominance, communality, or reciprocity). The emotional costs of a mismatch in the assumed relationship type can create a need for plausible deniability and, thereby, select for indirectness even when there are no tangible costs. Third, people perceive language as a digital medium, which allows a sentence to generate common knowledge, to propagate a message with high fidelity, and to serve as a reference point in coordination games. This feature makes an indirect request qualitatively different from a direct one even when the speaker and listener can infer each other's intentions with high confidence. PMID:18199841

  11. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    PubMed Central

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259

  12. Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech 

    E-print Network

    De Leon, P.L.; Pucher, M.; Yamagishi, Junichi

    2010-01-01

    In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis ...

  13. Understanding speech in interactive narratives with crowd sourced data

    E-print Network

    Orkin, Jeff

    Speech recognition failures and limited vocabulary coverage pose challenges for speech interactions with characters in games. We describe an end-to-end system for automating characters from a large corpus of recorded human ...

  14. Multi-level acoustic modeling for automatic speech recognition

    E-print Network

    Chang, Hung-An, Ph. D. Massachusetts Institute of Technology

    2012-01-01

    Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local ...

  15. Assigning phrase breaks from part-of-speech sequences 

    E-print Network

    Taylor, Paul; Black, Alan W

    This paper presents an algorithm for automatically assigning phrase breaks to unrestricted text for use in a text-to-speech synthesizer. Text is first converted into a sequence of part-of-speech tags. Next a Markov model ...

  16. An accent-independent lexicon for automatic speech recognition. 

    E-print Network

    Van Bael, Christophe; King, Simon

    2003-01-01

    Recent work at the Centre for Speech Technology Re- search (CSTR) at the University of Edinburgh has de- veloped an accent-independent lexicon for speech synthesis (the Unisyn project). The main purpose of this lexicon is ...

  17. Optimising selection of units from speech databases for concatenative synthesis. 

    E-print Network

    Black, Alan W; Campbell, Nick

    1995-01-01

    Concatenating units of natural speech is one method of speech synthesis1. Most such systems use an inventory of fixed length units, typically diphones or triphones with one instance of each type. An alternative is to use ...

  18. Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology.

    PubMed

    Alderson-Day, Ben; Fernyhough, Charles

    2015-09-01

    Inner speech-also known as covert speech or verbal thinking-has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789

  19. Exploring speech therapy games with children on the autism spectrum

    E-print Network

    Picard, Rosalind W.

    Individuals on the autism spectrum often have difficulties producing intelligible speech with either high or low speech rate, and atypical pitch and/or amplitude affect. In this study, we present a novel intervention towards ...

  20. Using intonation to constrain language models in speech recognition. 

    E-print Network

    Taylor, Paul A; King, Simon; Isard, Stephen; Wright, Helen; Kowtko, Jacqueline C

    1997-01-01

    This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of ...

  1. Synthesis and Evaluation of Conversational Characteristics in Speech Synthesis 

    E-print Network

    Andersson, Sebastian

    2013-01-01

    Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We ...

  2. Automatically clustering similar units for unit selection in speech synthesis. 

    E-print Network

    Black, Alan W; Taylor, Paul A

    1997-01-01

    This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class ...

  3. Multimodal speech interfaces for map-based applications

    E-print Network

    Liu, Sean (Sean Y.)

    2010-01-01

    This thesis presents the development of multimodal speech interfaces for mobile and vehicle systems. Multimodal interfaces have been shown to increase input efficiency in comparison with their purely speech or text-based ...

  4. Variable rate CELP speech coding using widely variable parameter updates 

    E-print Network

    Moodie, Myron L.

    1995-01-01

    for variable rate CELP speech coding. After a presentation of speech coding basics, general CELP coding concepts and several specific fixed and variable rate CELP coders are described. The widely variable CELP parameter update techniques are then developed...

  5. Synthesis and evaluation of conversational characteristics in speech synthesis 

    E-print Network

    Andersson, Johan Sebastian

    2013-11-28

    Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We ...

  6. Speaker-Independent HMM-based Speech Synthesis System 

    E-print Network

    Yamagishi, Junichi; Zen, Heiga; Toda, Tomoki; Tokuda, Keiichi

    2007-01-01

    This paper describes an HMM-based speech synthesis system developed by the HTS working group for the Blizzard Challenge 2007. To further explore the potential of HMM-based speech synthesis, we incorporate new features ...

  7. An annotation scheme for concept-to-speech synthesis. 

    E-print Network

    Hitzeman, Janet; Black, Alan W; Taylor, Paul; Mellish, Chris; Oberlander, Jon

    1999-01-01

    The SOLE conecept-to-speech system uses linguistic information provided by an NLG component to improve the intonation of synthetic speech. As the text is generated, the system automatically annotates the text with linguistic ...

  8. Toward a social signaling framework : activity and emphasis in speech

    E-print Network

    Stoltzman, William T

    2006-01-01

    Language is not the only form of verbal communication. Loudness, pitch, speaking rate, and other non-linguistic speech features are crucial aspects of human spoken interaction. In this thesis, we separate these speech ...

  9. Multilingual Speech Recognition for Information Retrieval in Indian context

    E-print Network

    Multilingual Speech Recognition for Information Retrieval in Indian context Udhyakumar for Indian languages. The system is originally designed for Hindi and Tamil languages and adapted to incorporate Indian accented Eng- lish. Language-specific characteristics in speech recognition framework

  10. Enhancement of Esophageal Speech Using Statistical Voice Conversion

    E-print Network

    Duh, Kevin

    sounds because their vocal cords have been removed. Thus, they require another method to produce speech and articulating it to produce audible speech sounds using their esophagus and vocal organs, the generated voices

  11. A tutorial on HMM speech synthesis (Invited paper) 

    E-print Network

    Simon King

    2010-01-01

    Statistical parametric speech synthesis, based on HMM-like models, has become competitive with established concatenative techniques over the last few years. This paper offers a non-mathematical introduction to this method of speech synthesis...

  12. A Posterior Approach for Microphone Array Based Speech Recognition 

    E-print Network

    Wang, Dong; Himawan, Ivan; Frankel, Joe; King, Simon

    2008-01-01

    posterior-based approach for array-based speech recognition. The main idea is, instead of enhancing speech signals, we try to enhance the posterior probabilities that frames belonging to recognition units, e.g., phones. These enhanced posteriors...

  13. An Investigation of nonlinear speech synthesis and pitch modification techniques 

    E-print Network

    Mann, Iain

    Speech synthesis technology plays an important role in many aspects of man–machine interaction, particularly in telephony applications. In order to be widely accepted, the synthesised speech quality should be as human–like ...

  14. Unsupervised adaptation for HMM-based speech synthesis 

    E-print Network

    King, Simon; Tokuda, Keiichi; Zen, Heiga; Yamagishi, Junichi

    It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been...

  15. A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION

    E-print Network

    A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION Jingdong Chen U*, Kuldip K-mail: {jingdong.chen, nakamura}@slt.atr.co.jp, k.paliwal@me.gu.edu.au ABSTRACT Noise robust speech recognition has

  16. Optimization of acoustic feature extraction from dysarthric speech

    E-print Network

    DiCicco, Thomas M., Jr. (Thomas Minotti)

    2010-01-01

    Dysarthria is a motor speech disorder characterized by weak or uncoordinated movements of the speech musculature. While unfamiliar listeners struggle to understand speakers with severe dysarthria, familiar listeners are ...

  17. The generation of regional pronunciations of English for speech synthesis. 

    E-print Network

    Fitt, Susan

    1997-01-01

    Most speech synthesisers and recognisers for English currently use pronunciation lexicons in standard British or American accents, but as use of speech technology grows there will be more demand for the incorporation of ...

  18. HMM-based Speech Synthesis from Audio Book Data 

    E-print Network

    Haag, Kathrin

    In contrast to hand-crafted speech databases, which contain short out-of-context sentences in fairly unemphatic speech style, audio books contain rich prosody including intonation contours, pitch accents and phrasing patterns, which is a good pre...

  19. Speech synthesis reactive to dynamic noise environmental conditions 

    E-print Network

    Palmaz López-Peláez, Susana

    2013-11-27

    In this report we are going to address the issues of speech synthesis in changing noise conditions. We will investigate the potential improvements that can be introduced by using a speech synthesiser in these conditions that is able to modulate...

  20. Overview of speech technology of the 80's

    SciTech Connect

    Crook, S.B.

    1981-01-01

    The author describes the technology innovations necessary to accommodate the market need which is the driving force toward greater perceived computer intelligence. The author discusses aspects of both speech synthesis and speech recognition.

  1. Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

    PubMed Central

    2015-01-01

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789

  2. TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Speech Understanding Performance of Cochlear

    E-print Network

    TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Speech Understanding Performance of Cochlear Implant, and Jan Wouters Abstract--Cochlear Implant (CI) recipients report severe degradation of speech, cochlear implants, time fre- quency masking, phase error variance. I. INTRODUCTION ACochlear Implant (CI

  3. Speech Evoked Auditory Brainstem Response in Stuttering

    PubMed Central

    Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad

    2014-01-01

    Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40?ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262

  4. Gesture facilitates the syntactic analysis of speech.

    PubMed

    Holle, Henning; Obermeier, Christian; Schmidt-Kassow, Maren; Friederici, Angela D; Ward, Jamie; Gunter, Thomas C

    2012-01-01

    Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are integrated systems. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures) influences language comprehension, but not a simple visual movement lacking such an intention. PMID:22457657

  5. Speech Production as State Feedback Control

    PubMed Central

    Houde, John F.; Nagarajan, Srikantan S.

    2011-01-01

    Spoken language exists because of a remarkable neural process. Inside a speaker's brain, an intended message gives rise to neural signals activating the muscles of the vocal tract. The process is remarkable because these muscles are activated in just the right way that the vocal tract produces sounds a listener understands as the intended message. What is the best approach to understanding the neural substrate of this crucial motor control process? One of the key recent modeling developments in neuroscience has been the use of state feedback control (SFC) theory to explain the role of the CNS in motor control. SFC postulates that the CNS controls motor output by (1) estimating the current dynamic state of the thing (e.g., arm) being controlled, and (2) generating controls based on this estimated state. SFC has successfully predicted a great range of non-speech motor phenomena, but as yet has not received attention in the speech motor control community. Here, we review some of the key characteristics of speech motor control and what they say about the role of the CNS in the process. We then discuss prior efforts to model the role of CNS in speech motor control, and argue that these models have inherent limitations – limitations that are overcome by an SFC model of speech motor control which we describe. We conclude by discussing a plausible neural substrate of our model. PMID:22046152

  6. Music and speech prosody: a common rhythm

    PubMed Central

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  7. Music and speech prosody: a common rhythm.

    PubMed

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  8. Robust speech coding using microphone arrays

    NASA Astrophysics Data System (ADS)

    Li, Zhao

    1998-09-01

    To achieve robustness and efficiency for voice communication in noise, the noise suppression and bandwidth compression processes are combined to form a joint process using input from an array of microphones. An adaptive beamforming technique with a set of robust linear constraints and a single quadratic inequality constraint is used to preserve desired signal and to cancel directional plus ambient noise in a small room environment. This robustly constrained array processor is found to be effective in limiting signal cancelation over a wide range of input SNRs (-10 dB to +10 dB). The resulting intelligibility gains (8-10 dB) provide significant improvement to subsequent CELP coding. In addition, the desired speech activity is detected by estimating Target-to-Jammer Ratios (TJR) using subband correlations between different microphone inputs or using signals within the Generalized Sidelobe Canceler directly. These two novel techniques of speech activity detection for coding are studied thoroughly in this dissertation. Each is subsequently incorporated with the adaptive array and a 4.8 kbps CELP coder to form a Variable Bit Kate (VBR) coder with noise canceling and Spatial Voice Activity Detection (SVAD) capabilities. This joint noise suppression and bandwidth compression system demonstrates large improvements in desired speech quality after coding, accurate desired speech activity detection in various types of interference, and a reduction in the information bits required to code the speech.

  9. 570 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 8, NOVEMBER 2002 Low-Bitrate Distributed Speech Recognition for

    E-print Network

    Alwan, Abeer

    -Bitrate Distributed Speech Recognition for Packet-Based and Wireless Communication Alexis Bernard, Student Member for distributed (wireless or packet- based) speech recognition. It is shown that speech recognition as opposed or less. Index Terms--Automatic speech recognition, distributed speech recognition (DSR), joint channel

  10. [Functional imaging of physiological and pathological speech production].

    PubMed

    Kell, C A

    2014-06-01

    Numerous neurological patients suffer from speech and language disorders but the underlying pathomechanisms are not well understood. Imaging studies on speech production disorders lag behind aphasiological research on speech perception, probably due to worries concerning movement artifacts. Meanwhile, modern neuroimaging techniques allow investigation of these processes. This article summarizes the insights from neuroimaging on physiological speech production and also on the pathomechanisms underlying Parkinson's disease and developmental stuttering. PMID:24832012

  11. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music

    PubMed Central

    Lee, Hweeling; Noppeney, Uta

    2014-01-01

    This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech. PMID:25147539

  12. Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception

    PubMed Central

    Skipper, Jeremy I.; van Wassenhove, Virginie; Nusbaum, Howard C.; Small, Steven L.

    2009-01-01

    Observing a speaker’s mouth profoundly influences speech perception. For example, listeners perceive an “illusory” “ta” when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory “ta” percept are more similar to the activity patterns evoked by AV/ta/ than they are to patterns evoked by AV/pa/ or AV/ka/. In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory “ta” in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV/pa/ and AV/ka/, respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV/ta/. Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation. PMID:17218482

  13. 76 FR 44326 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-25

    ...03-123 and 10-51; FCC 11-104] Telecommunications Relay Services and Speech-to-Speech...through June 30, 2012 Interstate Telecommunications Relay Services (``TRS'') Fund...a summary of the Commission's Telecommunications Relay Services and...

  14. 75 FR 54040 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-03

    ... From the Federal Register Online via the Government Printing Office FEDERAL COMMUNICATIONS COMMISSION 47 CFR Part 64 Telecommunications Relay Services and Speech-to-Speech Services for Individuals... waivers of certain Telecommunications Relay Services (TRS) mandatory minimum standards for Video...

  15. System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

    DOEpatents

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-04-25

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  16. Changes in breathing while listening to read speech: the effect of reader and speech mode

    PubMed Central

    Rochet-Capellan, Amélie; Fuchs, Susanne

    2013-01-01

    The current paper extends previous work on breathing during speech perception and provides supplementary material regarding the hypothesis that adaptation of breathing during perception “could be a basis for understanding and imitating actions performed by other people” (Paccalin and Jeannerod, 2000). The experiments were designed to test how the differences in reader breathing due to speaker-specific characteristics, or differences induced by changes in loudness level or speech rate influence the listener breathing. Two readers (a male and a female) were pre-recorded while reading short texts with normal and then loud speech (both readers) or slow speech (female only). These recordings were then played back to 48 female listeners. The movements of the rib cage and abdomen were analyzed for both the readers and the listeners. Breathing profiles were characterized by the movement expansion due to inhalation and the duration of the breathing cycle. We found that both loudness and speech rate affected each reader’s breathing in different ways. Listener breathing was different when listening to the male or the female reader and to the different speech modes. However, differences in listener breathing were not systematically in the same direction as reader differences. The breathing of listeners was strongly sensitive to the order of presentation of speech mode and displayed some adaptation in the time course of the experiment in some conditions. In contrast to specific alignments of breathing previously observed in face-to-face dialog, no clear evidence for a listener–reader alignment in breathing was found in this purely auditory speech perception task. The results and methods are relevant to the question of the involvement of physiological adaptations in speech perception and to the basic mechanisms of listener–speaker coupling. PMID:24367344

  17. Human-Machine Collaboration for Rapid Speech Transcription

    E-print Network

    Roy, Deb

    Human-Machine Collaboration for Rapid Speech Transcription by Brandon C. Roy Sc.B., Brown Speech Transcription by Brandon C. Roy Submitted to the Program in Media Arts and Sciences, School is to obtain high quality transcripts of all speech heard and produced by the child. Unfortunately, automatic

  18. Automated Assessment of Speech Fluency for L2 English Learners

    ERIC Educational Resources Information Center

    Yoon, Su-Youn

    2009-01-01

    This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…

  19. Concept-to-speech synthesis by phonological structure matching 

    E-print Network

    Taylor, Paul

    2000-04-15

    This paper presents a new way of generating synthetic-speech waveforms from a linguistic description. The algorithm is presented as a proposed solution to the speech-generation problem in a concept-to-speech system. Off-line, a database of recorded...

  20. Department of Communication Disorders Speech-Language Pathology

    E-print Network

    Department of Communication Disorders Speech-Language Pathology External-Site Clinical Practicum Statement 5 I. Objectives for the External Clinical Experience in Speech-Language Pathology 5 II. Contact Information about School and Program 7 g. LSUHSC Speech-Language Pathology Faculty and Staff 8 h